API Reference
Benchmark orchestration
Benchmark orchestration for simi-search LIT-PCBA experiments.
- class simi_search.benchmark.LitPcbaTargetRepository(data_dir)[source]
Bases:
objectRead processed LIT-PCBA target CSVs.
- Parameters:
data_dir (Path)
- class simi_search.benchmark.SimilarityBenchmarkRunner(*, repository, searcher=None)[source]
Bases:
objectRun active-query similarity retrieval for one or more LIT-PCBA targets.
- Parameters:
repository (LitPcbaTargetRepository)
searcher (MaxActiveSimilaritySearch | None)
- run(targets=None)[source]
- Parameters:
targets (list[str] | None)
- Return type:
list[SimilarityResult]
Fingerprints
Fingerprint implementations for ligand similarity search.
- class simi_search.fingerprints.Fingerprinter[source]
Bases:
ABCInterface for molecule featurizers used by similarity search.
- class simi_search.fingerprints.HashedSmilesFingerprint(*, n_bits=2048, min_ngram=2, max_ngram=4)[source]
Bases:
FingerprinterDependency-free hashed token fingerprint for SMILES similarity.
- Parameters:
n_bits (int)
min_ngram (int)
max_ngram (int)
- class simi_search.fingerprints.RdkitMorganFingerprint(*, radius=2, n_bits=2048)[source]
Bases:
FingerprinterRDKit Morgan/ECFP bit-vector fingerprinter.
RDKit is optional so the base package remains lightweight. Install it with
pip install "simi-search[rdkit]"orconda install -c conda-forge rdkit.- Parameters:
radius (int)
n_bits (int)
- simi_search.fingerprints.build_fingerprinter(name)[source]
Create a supported fingerprinter by CLI/config name.
- Parameters:
name (str)
- Return type:
Fingerprint selection
The command line exposes the same backends as the Python API:
benchmark-similarity --fingerprint hashed
benchmark-similarity --fingerprint rdkit
rdkit uses Morgan fingerprints and requires RDKit to be installed.
Similarity search
OOP ligand similarity search implementations.
- class simi_search.search.MaxActiveSimilaritySearch(*, fingerprinter=None, method_name='hashed_smiles', similarity=None)[source]
Bases:
objectScore candidates by maximum similarity to active reference ligands.
- Parameters:
fingerprinter (Fingerprinter | None)
method_name (str)
similarity (TanimotoSimilarity | None)
Metrics
Ranking metrics for virtual-screening benchmarks.
- class simi_search.metrics.RankingMetrics(compounds: 'int', actives: 'int', ef1: 'float', ef5: 'float', bedroc20: 'float', roc_auc: 'float', pr_auc: 'float', top1_actives: 'int', top5_actives: 'int')[source]
Bases:
object- Parameters:
compounds (int)
actives (int)
ef1 (float)
ef5 (float)
bedroc20 (float)
roc_auc (float)
pr_auc (float)
top1_actives (int)
top5_actives (int)
- actives: int
- bedroc20: float
- compounds: int
- ef1: float
- ef5: float
- pr_auc: float
- roc_auc: float
- top1_actives: int
- top5_actives: int
- simi_search.metrics.bedroc(labels, alpha=20.0)[source]
Compute BEDROC using the Truchon/Jain early-recognition formula.
- Parameters:
labels (Sequence[int])
alpha (float)
- Return type:
float
- simi_search.metrics.compute_ranking_metrics(labels, scores)[source]
- Parameters:
labels (Sequence[int])
scores (Sequence[float])
- Return type:
Models
Shared domain models for simi-search benchmarking.
- class simi_search.models.Molecule(compound_id: 'str', smiles: 'str', label: 'int', target: 'str', split: 'str')[source]
Bases:
object- Parameters:
compound_id (str)
smiles (str)
label (int)
target (str)
split (str)
- compound_id: str
- label: int
- smiles: str
- split: str
- target: str
- class simi_search.models.SimilarityResult(target: 'str', method: 'str', train_queries: 'int', metrics: 'RankingMetrics')[source]
Bases:
object- Parameters:
target (str)
method (str)
train_queries (int)
metrics (RankingMetrics)
- method: str
- metrics: RankingMetrics
- target: str
- train_queries: int