Getting Started

Installation

Choose the installation method that matches your workflow. Simi Search targets Python 3.11.

Option 1: PyPI

pip install simi-search

Option 2: Source checkout

git clone https://github.com/ThinhUMP/simi_search.git
cd simi_search
python -m pip install -r requirements.txt

Optional RDKit fingerprints

pip install "simi-search[rdkit]"

or, in conda environments:

conda install -c conda-forge rdkit

Option 3: Docker image

The release workflow publishes a container to GitHub Container Registry:

docker pull ghcr.io/thinhump/simi_search:latest

Run the benchmark command inside the image with a mounted workspace:

docker run --rm \
  -v "$PWD/data:/app/data" \
  -v "$PWD/results:/app/results" \
  ghcr.io/thinhump/simi_search:latest \
  --data-dir data/processed/lit_pcba_ave \
  --output results/lit_pcba_similarity_metrics.csv

Quick Example

Prepare the AVE-unbiased LIT-PCBA benchmark:

download-lit-pcba --data-dir data --variant ave

Run a single-target similarity benchmark:

benchmark-similarity \
  --data-dir data/processed/lit_pcba_ave \
  --target PPARG \
  --output results/PPARG_similarity_metrics.csv

Run the same target with RDKit Morgan fingerprints:

benchmark-similarity \
  --data-dir data/processed/lit_pcba_ave \
  --target PPARG \
  --fingerprint rdkit \
  --output results/PPARG_rdkit_similarity_metrics.csv

Run all processed targets:

benchmark-similarity \
  --data-dir data/processed/lit_pcba_ave \
  --output results/lit_pcba_similarity_metrics.csv

Output Schema

The benchmark writes one row per target:

Column

Description

Target

LIT-PCBA target identifier.

Method

Similarity scoring method.

Train_Queries

Number of active training ligands used as references.

Compounds

Number of validation compounds ranked.

Actives

Number of active validation compounds.

EF1% / EF5%

Enrichment factors in the top 1% and 5%.

BEDROC20

Early-recognition metric with alpha 20.

ROC_AUC / PR_AUC

Global ranking and precision-recall metrics.

Next Steps

  • Read Methods for the benchmark protocol.

  • Read API Reference to extend the fingerprinter or searcher classes.

  • Read Release before publishing package artifacts.