Omics-Oracle β a Retrieval-Augmented GenAI agent that instantly surfaces druggable targets by mining PubMed, ChEMBL, and PDB in one shot.
AI-powered drug discovery platform that integrates multiple biological databases with Large Language Model intelligence for natural language therapeutic target analysis.
- π€ LLM-Powered Intelligence: Natural language query understanding with GPT-4
- π― Disease-to-Target Mapping: Automatic discovery from disease names ("diabetes" β PPARG, DPP4, GLP1R)
- π Multi-Database Integration: PubMed literature + ChEMBL bioactivity + RCSB PDB structures
- β‘ Async Performance: Concurrent API calls with intelligent SQLite caching
- π Comprehensive Scoring: Literature evidence + inhibitor potency + structural data (0-10 scale)
- π‘οΈ Production Ready: Rate limiting, fallback mechanisms, graceful error handling
- π Intelligent Summaries: LLM-generated analysis with clinical insights
# Clone and setup
git clone https://github.com/anugrahat/omics-oracle-.git
cd omics-oracle-
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Run immediately (no API keys required!)
python cli.py "type 2 diabetes therapeutic targets"
# Additional setup for beautiful web interface
pip install -r website/requirements.txt
# Launch Streamlit demo
cd website && streamlit run app.py
# Opens at http://localhost:8501
π― For Hiring Managers: See website/README.md
for deployment guide
# Copy the template
cp .env.example .env
# Edit .env file
nano .env # or your preferred editor
# Add your OpenAI API key:
OPENAI_API_KEY=sk-proj-your-actual-key-here
NCBI_API_KEY=your_ncbi_key_here # Optional for higher PubMed rate limits
- Visit OpenAI Platform
- Sign up/login to your account
- Click "Create new secret key"
- Copy the key (starts with
sk-proj-
) - Paste it in your
.env
file
π‘ Why Add OpenAI Key?
- π§ Intelligent Query Parsing: "Find EGFR inhibitors under 50 nM" β Automatic IC50 filtering
- π― Disease Discovery: "diabetes" β Automatically finds PPARG, DPP4, GLP1R targets
- π Beautiful Summaries: Clinical insights, target recommendations, publication-ready output
- π¬ Advanced Analysis: Confidence scoring, query type detection
β No API Keys? Still Works! Limited
- Without OpenAI: Falls back to regex parsing + curated disease mappings
- Without NCBI: Uses Europe PMC for literature search
- Full functionality maintained with graceful degradation!
Input:
python cli.py "type 2 diabetes therapeutic targets"
Output:
π Therapeutic Target Agent v2.0
Query: type 2 diabetes therapeutic targets
============================================================
π€ Parsing query with LLM...
π§ Detected disease query, switching to disease mode...
π― Multi-target analysis: PPARG, SLC2A4, ABCC8, DPP4, GLP1R
π― TOP TARGETS:
1. PPARG (Score: 8.4) - 50 inhibitors, 20 structures
2. DPP4 (Score: 7.9) - 50 inhibitors, 20 structures
3. ABCC8 (Score: 6.7) - 50 inhibitors, 0 structures
4. GLP1R (Score: 6.7) - 50 inhibitors, 0 structures
5. SLC2A4 (Score: 5.6) - 50 inhibitors, 0 structures
π TOP INHIBITORS:
1. CHEMBL107367 (PPARG): 0.35 nM - Sub-nanomolar potency!
2. CHEMBL1627326 (DPP4): 2.0 nM - Ultra-potent!
3. CHEMBL3330880 (ABCC8): 0.38 nM - Extraordinary potency!
More Disease Examples:
python cli.py "ovarian cancer drug targets"
# β Finds: BRCA1, BRCA2, PARP1, PIK3CA, PTEN
python cli.py "Alzheimer's disease targets"
# β Finds: APP, MAPT, APOE, BACE1, PSEN1
python cli.py "Parkinson's disease therapeutic targets"
# β Finds: LRRK2, SNCA, PINK1, PARK2, DJ1
Input:
python cli.py "Find potent EGFR inhibitors between 5 and 50 nM"
Output:
π Therapeutic Target Agent v2.0
Query: Find potent EGFR inhibitors between 5 and 50 nM
============================================================
π€ Parsing query with LLM...
π― Single target detected: EGFR
π¬ IC50 range: 5 - 50 nM
π― Analyzing target: EGFR
EGFR Target Analysis (Score: 7.9/10.0)
Assessment: Good drug target
π TOP INHIBITORS:
1. CHEMBL311111 - IC50: 5.0 nM (Quality: 0.70)
2. CHEMBL310740 - IC50: 6.0 nM (Quality: 0.70)
3. CHEMBL319065 - IC50: 6.0 nM (Quality: 0.70)
4. CHEMBL98475 - IC50: 6.0 nM (Quality: 0.70)
5. CHEMBL80540 - IC50: 7.0 nM (Quality: 0.70)
𧬠STRUCTURES: 20 high-quality structures found
π LITERATURE: 20 relevant research papers
More IC50 Examples:
# Ultra-potent inhibitors
python cli.py "JAK2 inhibitors under 10 nM"
# Broader range
python cli.py "BRAF targets with inhibitors between 10 and 500 nM"
# Multiple targets with potency
python cli.py "EGFR, JAK2, BRAF targets with sub-100 nM inhibitors"
Input:
python cli.py "EGFR, JAK2, BRAF cancer targets"
Output:
π― Target Classification:
β’ Excellent targets (2): JAK2, BRAF
β’ Good targets (1): EGFR
β’ Moderate targets (0):
β’ Challenging targets (0):
π― TOP TARGETS:
1. JAK2 (Score: 8.4) - 33 inhibitors, 20 structures
2. BRAF (Score: 8.3) - 39 inhibitors, 20 structures
3. EGFR (Score: 7.9) - 24 inhibitors, 20 structures
Input:
python cli.py "PARP1 therapeutic targets"
Output:
PARP1 Target Analysis (Score: 8.2/10.0)
Assessment: Excellent drug target
π TOP INHIBITORS:
1. CHEMBL589586 - IC50: 0.38 nM (Olaparib-like)
2. CHEMBL1173055 - IC50: 0.6 nM (Clinical candidate)
3. CHEMBL2107856 - IC50: 0.9 nM (Research compound)
π₯ CLINICAL RELEVANCE: FDA-approved PARP inhibitors for BRCA-mutant cancers
# Custom output file
python cli.py "diabetes targets" --output my_diabetes_analysis.json
# Disease analysis with structured output
python cli.py "lung cancer targets" --output lung_cancer_2024.json
Input:
python cli.py "Alzheimer's disease therapeutic targets"
Complete Output:
π Therapeutic Target Agent v2.0
Query: Alzheimer's disease therapeutic targets
============================================================
π€ Parsing query with LLM...
π§ Detected disease query, switching to disease mode...
π― Multi-target analysis: PSEN1, MAPT, BACE1, APP, APOE
π― Analyzing target: PSEN1
π Gathering data from PubMed, ChEMBL, and PDB...
β
Literature: 20 results
β
Inhibitors: 50 results
β
Structures: 2 results
[... analysis continues for all targets ...]
============================================================
π ANALYSIS SUMMARY
============================================================
Multi-Target Analysis Summary
Analyzed 5 targets (Average score: 7.7/10.0)
π― Target Classification:
β’ Excellent targets (4): PSEN1, MAPT, BACE1, APP
β’ Good targets (0):
β’ Moderate targets (1): APOE
β’ Challenging targets (0):
π― TOP TARGETS:
1. PSEN1 (Score: 8.4) - 50 inhibitors, 2 structures
2. MAPT (Score: 8.4) - 50 inhibitors, 1 structures
3. BACE1 (Score: 8.4) - 50 inhibitors, 20 structures
4. APP (Score: 8.4) - 50 inhibitors, 20 structures
5. APOE (Score: 4.7) - 0 inhibitors, 7 structures
π€ Generating intelligent summary...
================================================================================
π INTELLIGENT ANALYSIS SUMMARY
================================================================================
EXECUTIVE SUMMARY
-----------------
The analysis of potential therapeutic targets for Alzheimer's disease reveals
five key genes: PSEN1, MAPT, BACE1, APP, and APOE. These targets exhibit
varying degrees of druggability, with PSEN1, MAPT, BACE1, and APP showing
high target scores and significant number of inhibitors.
TARGET RANKING TABLE
--------------------
Target | Score | Inhibitors | Structures | Literature
-------|--------|------------|------------|-----------
PSEN1 | 8.45 | 50 | 2 | 20
MAPT | 8.45 | 50 | 1 | 20
BACE1 | 8.435 | 50 | 20 | 20
APP | 8.38 | 50 | 20 | 20
APOE | 4.73 | 0 | 7 | 20
KEY INSIGHTS
------------
β’ PSEN1, MAPT, BACE1, and APP show high druggability with numerous inhibitors
β’ BACE1 has excellent structural data (20 structures) for drug design
β’ APOE presents challenges but remains important for genetic risk
β’ No current FDA-approved drugs, but active research pipelines exist
INHIBITOR HIGHLIGHTS
---------------------
1. CHEMBL392068 (PSEN1): IC50 = 0.114 nM - Sub-nanomolar!
2. CHEMBL2036430 (MAPT): IC50 = 0.48 nM - Tau aggregation inhibitor
3. CHEMBL4452566 (BACE1): IC50 = 0.275 nM - Ξ²-secretase inhibitor
STRUCTURAL ANALYSIS
-------------------
β’ BACE1: Best structure (1.46 Γ
) - Excellent for drug design
β’ APP: High-resolution structures (1.5 Γ
) available
β’ PSEN1: Cryo-EM structures (3.3 Γ
) - Membrane protein complex
β’ APOE: Multiple high-quality structures (1.4 Γ
)
CLINICAL RECOMMENDATIONS
------------------------
PSEN1, MAPT, BACE1, and APP should be prioritized for further investigation
due to their high target scores and number of inhibitors. APOE, despite having
no inhibitors, should not be overlooked due to its genetic significance.
================================================================================
β
Analysis complete! Results saved to alzheimers_targets.json
βββ cli.py # π» Command-line interface
βββ thera_agent/ # 𧬠Core RAG system
β βββ agent.py # Main orchestrator
β βββ query_parser.py # LLM query understanding
β βββ disease_mapper.py # Diseaseβtarget mapping
β βββ result_summarizer.py # Intelligent summaries
β βββ data/
β βββ cache.py # SQLite caching
β βββ http_client.py # Async HTTP client
β βββ pubmed_client.py # Literature search
β βββ chembl_client.py # Bioactivity data
β βββ pdb_client.py # Protein structures
βββ website/ # π Web demo for recruiters
βββ app.py # Streamlit interface
βββ requirements.txt # Web dependencies
βββ README.md # Deployment guide
Component | Primary | Fallback | Impact |
---|---|---|---|
Query Parsing | OpenAI GPT-4 | Regex extraction | Full functionality |
Disease Mapping | LLM discovery | Curated mappings | Core diseases covered |
Literature | PubMed API | Europe PMC | 95%+ coverage |
Summaries | LLM analysis | Structured tables | Professional output |
Data Storage | SQLite cache | Live API calls | Performance maintained |
Rate Limits | Backoff/retry | Cached results | Graceful degradation |
Discovers clinically validated targets:
- Type 2 Diabetes: PPARG (pioglitazone), DPP4 (sitagliptin), GLP1R (semaglutide)
- Cancer: BRCA1/2 (olaparib), PIK3CA (alpelisib), EGFR (erlotinib)
- Alzheimer's: APP, BACE1, PSEN1 (active research targets)
- Type 1 Diabetes: INS, PTPN22, GAD65 (autoimmune targets)
- π Literature: PubMed (35M+ papers) + Europe PMC fallback
- π Bioactivity: ChEMBL (2M+ compounds, IC50/Ki values)
- 𧬠Structures: RCSB PDB (200K+ protein structures)
- π€ Intelligence: OpenAI GPT-4 for query understanding & summaries
- β Async/await architecture for performance
- β SQLite caching with intelligent TTL
- β Rate limiting and exponential backoff
- β Comprehensive logging and error handling
- β Works offline with cached data
- β No API keys required for basic functionality
- β LLM-powered summaries for publication-ready output
- Python 3.8+
- Dependencies:
pip install -r requirements.txt
- Optional: OpenAI API key for enhanced LLM features
- Optional: NCBI API key for PubMed rate limits
# Rare disease research
python cli.py "cystic fibrosis therapeutic targets"
python cli.py "Huntington's disease protein targets"
# Comparative analysis
python cli.py "CDK4, CDK6, CDK9 kinase inhibitor comparison"
python cli.py "EGFR, HER2, HER3 receptor family analysis"
# Structural biology
python cli.py "Find membrane protein targets with crystal structures"
# Target assessment
python cli.py "Assess druggability of KRAS G12C mutant"
python cli.py "Find backup targets for failed Alzheimer's programs"
# Competitive intelligence
python cli.py "JAK family inhibitors clinical pipeline"
python cli.py "PD-1, PD-L1, CTLA-4 immunotherapy targets"
# Portfolio planning
python cli.py "Oncology targets with sub-10 nM inhibitors"
# Resistance mechanisms
python cli.py "EGFR resistance mutation targets"
python cli.py "Find alternative targets for chemotherapy resistance"
# Combination therapy
python cli.py "DNA repair pathway targets for PARP combination"
python cli.py "Immunotherapy combination targets"
# Personalized medicine
python cli.py "BRCA1/2 deficient cancer targets"
# Find new indications
python cli.py "Antiviral targets with existing inhibitors"
python cli.py "Inflammation targets with FDA-approved drugs"
# Cross-disease analysis
python cli.py "Shared targets between diabetes and Alzheimer's"
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open Pull Request
MIT License - Open source for research and commercial use.
If you use Omics Oracle in your research, please cite:
@software{omics_oracle_2024,
title={Omics Oracle: AI-Powered Therapeutic Target Discovery},
author={Anugraha T},
year={2024},
url={https://github.com/anugrahat/omics-oracle-}
}
Feature | Without OpenAI | With OpenAI Key |
---|---|---|
Query Understanding | Basic regex | π§ Natural language |
Disease Mapping | 12 curated diseases | π― Any disease |
IC50 Filtering | Manual parsing | π¬ Automatic detection |
Results Summary | Basic tables | π Clinical insights |
Query Types | Limited | π Unlimited flexibility |
Built for the drug discovery community
Advancing precision medicine through intelligent target discovery
π Star this repo if it helps your research! π