API Reference

Benchmark orchestration

Benchmark orchestration for simi-search LIT-PCBA experiments.

class simi_search.benchmark.LitPcbaTargetRepository(data_dir)[source]

Bases: object

Read processed LIT-PCBA target CSVs.

Parameters:

data_dir (Path)

discover_targets()[source]
Return type:

list[str]

read_split(path)[source]
Parameters:

path (Path)

Return type:

list[Molecule]

train_csv(target)[source]
Parameters:

target (str)

Return type:

Path

validation_csv(target)[source]
Parameters:

target (str)

Return type:

Path

class simi_search.benchmark.SimilarityBenchmarkRunner(*, repository, searcher=None)[source]

Bases: object

Run active-query similarity retrieval for one or more LIT-PCBA targets.

Parameters:
run(targets=None)[source]
Parameters:

targets (list[str] | None)

Return type:

list[SimilarityResult]

run_target(target)[source]
Parameters:

target (str)

Return type:

SimilarityResult

Fingerprints

Fingerprint implementations for ligand similarity search.

class simi_search.fingerprints.Fingerprinter[source]

Bases: ABC

Interface for molecule featurizers used by similarity search.

abstractmethod fingerprint(smiles)[source]
Parameters:

smiles (str)

Return type:

int

class simi_search.fingerprints.HashedSmilesFingerprint(*, n_bits=2048, min_ngram=2, max_ngram=4)[source]

Bases: Fingerprinter

Dependency-free hashed token fingerprint for SMILES similarity.

Parameters:
  • n_bits (int)

  • min_ngram (int)

  • max_ngram (int)

fingerprint(smiles)[source]
Parameters:

smiles (str)

Return type:

int

class simi_search.fingerprints.RdkitMorganFingerprint(*, radius=2, n_bits=2048)[source]

Bases: Fingerprinter

RDKit Morgan/ECFP bit-vector fingerprinter.

RDKit is optional so the base package remains lightweight. Install it with pip install "simi-search[rdkit]" or conda install -c conda-forge rdkit.

Parameters:
  • radius (int)

  • n_bits (int)

fingerprint(smiles)[source]
Parameters:

smiles (str)

Return type:

int

simi_search.fingerprints.build_fingerprinter(name)[source]

Create a supported fingerprinter by CLI/config name.

Parameters:

name (str)

Return type:

Fingerprinter

Fingerprint selection

The command line exposes the same backends as the Python API:

benchmark-similarity --fingerprint hashed
benchmark-similarity --fingerprint rdkit

rdkit uses Morgan fingerprints and requires RDKit to be installed.

Similarity search

OOP ligand similarity search implementations.

class simi_search.search.MaxActiveSimilaritySearch(*, fingerprinter=None, method_name='hashed_smiles', similarity=None)[source]

Bases: object

Score candidates by maximum similarity to active reference ligands.

Parameters:
score(queries, candidates)[source]
Parameters:
Return type:

list[float]

class simi_search.search.TanimotoSimilarity[source]

Bases: object

Tanimoto similarity over integer bitset fingerprints.

score(left, right)[source]
Parameters:
  • left (int)

  • right (int)

Return type:

float

Metrics

Ranking metrics for virtual-screening benchmarks.

class simi_search.metrics.RankingMetrics(compounds: 'int', actives: 'int', ef1: 'float', ef5: 'float', bedroc20: 'float', roc_auc: 'float', pr_auc: 'float', top1_actives: 'int', top5_actives: 'int')[source]

Bases: object

Parameters:
  • compounds (int)

  • actives (int)

  • ef1 (float)

  • ef5 (float)

  • bedroc20 (float)

  • roc_auc (float)

  • pr_auc (float)

  • top1_actives (int)

  • top5_actives (int)

actives: int
bedroc20: float
compounds: int
ef1: float
ef5: float
pr_auc: float
roc_auc: float
top1_actives: int
top5_actives: int
simi_search.metrics.bedroc(labels, alpha=20.0)[source]

Compute BEDROC using the Truchon/Jain early-recognition formula.

Parameters:
  • labels (Sequence[int])

  • alpha (float)

Return type:

float

simi_search.metrics.compute_ranking_metrics(labels, scores)[source]
Parameters:
  • labels (Sequence[int])

  • scores (Sequence[float])

Return type:

RankingMetrics

simi_search.metrics.enrichment_factor(labels, fraction)[source]
Parameters:
  • labels (Sequence[int])

  • fraction (float)

Return type:

tuple[float, int]

simi_search.metrics.pr_auc(labels)[source]
Parameters:

labels (Sequence[int])

Return type:

float

simi_search.metrics.roc_auc(labels, scores)[source]
Parameters:
  • labels (Sequence[int])

  • scores (Sequence[float])

Return type:

float

Models

Shared domain models for simi-search benchmarking.

class simi_search.models.Molecule(compound_id: 'str', smiles: 'str', label: 'int', target: 'str', split: 'str')[source]

Bases: object

Parameters:
  • compound_id (str)

  • smiles (str)

  • label (int)

  • target (str)

  • split (str)

compound_id: str
label: int
smiles: str
split: str
target: str
class simi_search.models.SimilarityResult(target: 'str', method: 'str', train_queries: 'int', metrics: 'RankingMetrics')[source]

Bases: object

Parameters:
  • target (str)

  • method (str)

  • train_queries (int)

  • metrics (RankingMetrics)

method: str
metrics: RankingMetrics
target: str
train_queries: int