Getting Started
Installation
Choose the installation method that matches your workflow. Simi Search targets Python 3.11.
Option 1: PyPI
pip install simi-search
Option 2: Source checkout
git clone https://github.com/ThinhUMP/simi_search.git
cd simi_search
python -m pip install -r requirements.txt
Optional RDKit fingerprints
pip install "simi-search[rdkit]"
or, in conda environments:
conda install -c conda-forge rdkit
Option 3: Docker image
The release workflow publishes a container to GitHub Container Registry:
docker pull ghcr.io/thinhump/simi_search:latest
Run the benchmark command inside the image with a mounted workspace:
docker run --rm \
-v "$PWD/data:/app/data" \
-v "$PWD/results:/app/results" \
ghcr.io/thinhump/simi_search:latest \
--data-dir data/processed/lit_pcba_ave \
--output results/lit_pcba_similarity_metrics.csv
Quick Example
Prepare the AVE-unbiased LIT-PCBA benchmark:
download-lit-pcba --data-dir data --variant ave
Run a single-target similarity benchmark:
benchmark-similarity \
--data-dir data/processed/lit_pcba_ave \
--target PPARG \
--output results/PPARG_similarity_metrics.csv
Run the same target with RDKit Morgan fingerprints:
benchmark-similarity \
--data-dir data/processed/lit_pcba_ave \
--target PPARG \
--fingerprint rdkit \
--output results/PPARG_rdkit_similarity_metrics.csv
Run all processed targets:
benchmark-similarity \
--data-dir data/processed/lit_pcba_ave \
--output results/lit_pcba_similarity_metrics.csv
Output Schema
The benchmark writes one row per target:
Column |
Description |
|---|---|
|
LIT-PCBA target identifier. |
|
Similarity scoring method. |
|
Number of active training ligands used as references. |
|
Number of validation compounds ranked. |
|
Number of active validation compounds. |
|
Enrichment factors in the top 1% and 5%. |
|
Early-recognition metric with alpha 20. |
|
Global ranking and precision-recall metrics. |
Next Steps
Read Methods for the benchmark protocol.
Read API Reference to extend the fingerprinter or searcher classes.
Read Release before publishing package artifacts.