Skip to content

anugrahat/omics-oracle_website

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Omics-Oracle β€” a Retrieval-Augmented GenAI agent that instantly surfaces druggable targets by mining PubMed, ChEMBL, and PDB in one shot.

AI-powered drug discovery platform that integrates multiple biological databases with Large Language Model intelligence for natural language therapeutic target analysis.

License Python Status

✨ Key Features

  • πŸ€– LLM-Powered Intelligence: Natural language query understanding with GPT-4
  • 🎯 Disease-to-Target Mapping: Automatic discovery from disease names ("diabetes" β†’ PPARG, DPP4, GLP1R)
  • πŸ“Š Multi-Database Integration: PubMed literature + ChEMBL bioactivity + RCSB PDB structures
  • ⚑ Async Performance: Concurrent API calls with intelligent SQLite caching
  • πŸ† Comprehensive Scoring: Literature evidence + inhibitor potency + structural data (0-10 scale)
  • πŸ›‘οΈ Production Ready: Rate limiting, fallback mechanisms, graceful error handling
  • πŸ“‹ Intelligent Summaries: LLM-generated analysis with clinical insights

πŸš€ Quick Start

πŸ’» CLI Version (Core Tool)

# Clone and setup
git clone https://github.com/anugrahat/omics-oracle-.git
cd omics-oracle-
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run immediately (no API keys required!)
python cli.py "type 2 diabetes therapeutic targets"

🌐 Web Interface (For Recruiters/Demos)

# Additional setup for beautiful web interface
pip install -r website/requirements.txt

# Launch Streamlit demo
cd website && streamlit run app.py
# Opens at http://localhost:8501

🎯 For Hiring Managers: See website/README.md for deployment guide

πŸ”§ Configuration (Optional but Recommended)

Step 1: Set up Environment File

# Copy the template
cp .env.example .env

Step 2: Add OpenAI API Key (Recommended)

# Edit .env file
nano .env  # or your preferred editor

# Add your OpenAI API key:
OPENAI_API_KEY=sk-proj-your-actual-key-here
NCBI_API_KEY=your_ncbi_key_here  # Optional for higher PubMed rate limits

Step 3: Get OpenAI API Key

  1. Visit OpenAI Platform
  2. Sign up/login to your account
  3. Click "Create new secret key"
  4. Copy the key (starts with sk-proj-)
  5. Paste it in your .env file

πŸ’‘ Why Add OpenAI Key?

  • 🧠 Intelligent Query Parsing: "Find EGFR inhibitors under 50 nM" β†’ Automatic IC50 filtering
  • 🎯 Disease Discovery: "diabetes" β†’ Automatically finds PPARG, DPP4, GLP1R targets
  • πŸ“‹ Beautiful Summaries: Clinical insights, target recommendations, publication-ready output
  • πŸ”¬ Advanced Analysis: Confidence scoring, query type detection

βœ… No API Keys? Still Works! Limited

  • Without OpenAI: Falls back to regex parsing + curated disease mappings
  • Without NCBI: Uses Europe PMC for literature search
  • Full functionality maintained with graceful degradation!

πŸ’‘ Usage Examples

🧠 Disease-Based Discovery (Recommended)

Input:

python cli.py "type 2 diabetes therapeutic targets"

Output:

πŸš€ Therapeutic Target Agent v2.0
Query: type 2 diabetes therapeutic targets
============================================================
πŸ€– Parsing query with LLM...
🧠 Detected disease query, switching to disease mode...
🎯 Multi-target analysis: PPARG, SLC2A4, ABCC8, DPP4, GLP1R

🎯 TOP TARGETS:
1. PPARG (Score: 8.4) - 50 inhibitors, 20 structures
2. DPP4 (Score: 7.9) - 50 inhibitors, 20 structures
3. ABCC8 (Score: 6.7) - 50 inhibitors, 0 structures
4. GLP1R (Score: 6.7) - 50 inhibitors, 0 structures
5. SLC2A4 (Score: 5.6) - 50 inhibitors, 0 structures

πŸ’Š TOP INHIBITORS:
1. CHEMBL107367 (PPARG): 0.35 nM - Sub-nanomolar potency!
2. CHEMBL1627326 (DPP4): 2.0 nM - Ultra-potent!
3. CHEMBL3330880 (ABCC8): 0.38 nM - Extraordinary potency!

More Disease Examples:

python cli.py "ovarian cancer drug targets"  
# β†’ Finds: BRCA1, BRCA2, PARP1, PIK3CA, PTEN

python cli.py "Alzheimer's disease targets"
# β†’ Finds: APP, MAPT, APOE, BACE1, PSEN1

python cli.py "Parkinson's disease therapeutic targets"
# β†’ Finds: LRRK2, SNCA, PINK1, PARK2, DJ1

🎯 IC50 Range Filtering

Input:

python cli.py "Find potent EGFR inhibitors between 5 and 50 nM"

Output:

πŸš€ Therapeutic Target Agent v2.0
Query: Find potent EGFR inhibitors between 5 and 50 nM
============================================================
πŸ€– Parsing query with LLM...
🎯 Single target detected: EGFR
πŸ”¬ IC50 range: 5 - 50 nM
🎯 Analyzing target: EGFR

EGFR Target Analysis (Score: 7.9/10.0)
Assessment: Good drug target

πŸ’Š TOP INHIBITORS:
1. CHEMBL311111 - IC50: 5.0 nM (Quality: 0.70)
2. CHEMBL310740 - IC50: 6.0 nM (Quality: 0.70)
3. CHEMBL319065 - IC50: 6.0 nM (Quality: 0.70)
4. CHEMBL98475 - IC50: 6.0 nM (Quality: 0.70)
5. CHEMBL80540 - IC50: 7.0 nM (Quality: 0.70)

🧬 STRUCTURES: 20 high-quality structures found
πŸ“š LITERATURE: 20 relevant research papers

More IC50 Examples:

# Ultra-potent inhibitors
python cli.py "JAK2 inhibitors under 10 nM"

# Broader range
python cli.py "BRAF targets with inhibitors between 10 and 500 nM"

# Multiple targets with potency
python cli.py "EGFR, JAK2, BRAF targets with sub-100 nM inhibitors"

βš–οΈ Multi-Target Comparison

Input:

python cli.py "EGFR, JAK2, BRAF cancer targets"

Output:

🎯 Target Classification:
β€’ Excellent targets (2): JAK2, BRAF
β€’ Good targets (1): EGFR
β€’ Moderate targets (0): 
β€’ Challenging targets (0):

🎯 TOP TARGETS:
1. JAK2 (Score: 8.4) - 33 inhibitors, 20 structures
2. BRAF (Score: 8.3) - 39 inhibitors, 20 structures
3. EGFR (Score: 7.9) - 24 inhibitors, 20 structures

🧬 Specific Protein Targets

Input:

python cli.py "PARP1 therapeutic targets"

Output:

PARP1 Target Analysis (Score: 8.2/10.0)
Assessment: Excellent drug target

πŸ’Š TOP INHIBITORS:
1. CHEMBL589586 - IC50: 0.38 nM (Olaparib-like)
2. CHEMBL1173055 - IC50: 0.6 nM (Clinical candidate)
3. CHEMBL2107856 - IC50: 0.9 nM (Research compound)

πŸ₯ CLINICAL RELEVANCE: FDA-approved PARP inhibitors for BRCA-mutant cancers

πŸ’Ύ Save Results

# Custom output file
python cli.py "diabetes targets" --output my_diabetes_analysis.json

# Disease analysis with structured output
python cli.py "lung cancer targets" --output lung_cancer_2024.json

πŸ“‹ Complete LLM-Powered Analysis Example

Input:

python cli.py "Alzheimer's disease therapeutic targets"

Complete Output:

πŸš€ Therapeutic Target Agent v2.0
Query: Alzheimer's disease therapeutic targets
============================================================
πŸ€– Parsing query with LLM...
🧠 Detected disease query, switching to disease mode...
🎯 Multi-target analysis: PSEN1, MAPT, BACE1, APP, APOE

🎯 Analyzing target: PSEN1
πŸ“Š Gathering data from PubMed, ChEMBL, and PDB...
βœ… Literature: 20 results
βœ… Inhibitors: 50 results  
βœ… Structures: 2 results

[... analysis continues for all targets ...]

============================================================
πŸ“‹ ANALYSIS SUMMARY
============================================================
Multi-Target Analysis Summary
Analyzed 5 targets (Average score: 7.7/10.0)

🎯 Target Classification:
β€’ Excellent targets (4): PSEN1, MAPT, BACE1, APP
β€’ Good targets (0): 
β€’ Moderate targets (1): APOE
β€’ Challenging targets (0):

🎯 TOP TARGETS:
1. PSEN1 (Score: 8.4) - 50 inhibitors, 2 structures
2. MAPT (Score: 8.4) - 50 inhibitors, 1 structures
3. BACE1 (Score: 8.4) - 50 inhibitors, 20 structures
4. APP (Score: 8.4) - 50 inhibitors, 20 structures
5. APOE (Score: 4.7) - 0 inhibitors, 7 structures

πŸ€– Generating intelligent summary...

================================================================================
πŸ“‹ INTELLIGENT ANALYSIS SUMMARY
================================================================================

EXECUTIVE SUMMARY
-----------------
The analysis of potential therapeutic targets for Alzheimer's disease reveals 
five key genes: PSEN1, MAPT, BACE1, APP, and APOE. These targets exhibit 
varying degrees of druggability, with PSEN1, MAPT, BACE1, and APP showing 
high target scores and significant number of inhibitors.

TARGET RANKING TABLE
--------------------
Target | Score  | Inhibitors | Structures | Literature
-------|--------|------------|------------|-----------
PSEN1  | 8.45   | 50         | 2          | 20
MAPT   | 8.45   | 50         | 1          | 20
BACE1  | 8.435  | 50         | 20         | 20
APP    | 8.38   | 50         | 20         | 20
APOE   | 4.73   | 0          | 7          | 20

KEY INSIGHTS
------------
β€’ PSEN1, MAPT, BACE1, and APP show high druggability with numerous inhibitors
β€’ BACE1 has excellent structural data (20 structures) for drug design
β€’ APOE presents challenges but remains important for genetic risk
β€’ No current FDA-approved drugs, but active research pipelines exist

INHIBITOR HIGHLIGHTS
---------------------
1. CHEMBL392068 (PSEN1): IC50 = 0.114 nM - Sub-nanomolar!
2. CHEMBL2036430 (MAPT): IC50 = 0.48 nM - Tau aggregation inhibitor
3. CHEMBL4452566 (BACE1): IC50 = 0.275 nM - Ξ²-secretase inhibitor

STRUCTURAL ANALYSIS
-------------------
β€’ BACE1: Best structure (1.46 Γ…) - Excellent for drug design
β€’ APP: High-resolution structures (1.5 Γ…) available
β€’ PSEN1: Cryo-EM structures (3.3 Γ…) - Membrane protein complex
β€’ APOE: Multiple high-quality structures (1.4 Γ…)

CLINICAL RECOMMENDATIONS
------------------------
PSEN1, MAPT, BACE1, and APP should be prioritized for further investigation 
due to their high target scores and number of inhibitors. APOE, despite having 
no inhibitors, should not be overlooked due to its genetic significance.

================================================================================

βœ… Analysis complete! Results saved to alzheimers_targets.json

πŸ—οΈ Architecture

β”œβ”€β”€ cli.py                # πŸ’» Command-line interface
β”œβ”€β”€ thera_agent/          # 🧬 Core RAG system
β”‚   β”œβ”€β”€ agent.py          # Main orchestrator
β”‚   β”œβ”€β”€ query_parser.py   # LLM query understanding
β”‚   β”œβ”€β”€ disease_mapper.py # Diseaseβ†’target mapping
β”‚   β”œβ”€β”€ result_summarizer.py # Intelligent summaries
β”‚   └── data/
β”‚       β”œβ”€β”€ cache.py      # SQLite caching
β”‚       β”œβ”€β”€ http_client.py # Async HTTP client
β”‚       β”œβ”€β”€ pubmed_client.py # Literature search
β”‚       β”œβ”€β”€ chembl_client.py # Bioactivity data
β”‚       └── pdb_client.py # Protein structures
└── website/              # 🌐 Web demo for recruiters
    β”œβ”€β”€ app.py            # Streamlit interface
    β”œβ”€β”€ requirements.txt  # Web dependencies
    └── README.md         # Deployment guide

πŸ›‘οΈ Robust Fallback Systems

Component Primary Fallback Impact
Query Parsing OpenAI GPT-4 Regex extraction Full functionality
Disease Mapping LLM discovery Curated mappings Core diseases covered
Literature PubMed API Europe PMC 95%+ coverage
Summaries LLM analysis Structured tables Professional output
Data Storage SQLite cache Live API calls Performance maintained
Rate Limits Backoff/retry Cached results Graceful degradation

🎯 Real-World Validation

Discovers clinically validated targets:

  • Type 2 Diabetes: PPARG (pioglitazone), DPP4 (sitagliptin), GLP1R (semaglutide)
  • Cancer: BRCA1/2 (olaparib), PIK3CA (alpelisib), EGFR (erlotinib)
  • Alzheimer's: APP, BACE1, PSEN1 (active research targets)
  • Type 1 Diabetes: INS, PTPN22, GAD65 (autoimmune targets)

πŸ“Š Data Sources

  • πŸ“š Literature: PubMed (35M+ papers) + Europe PMC fallback
  • πŸ’Š Bioactivity: ChEMBL (2M+ compounds, IC50/Ki values)
  • 🧬 Structures: RCSB PDB (200K+ protein structures)
  • πŸ€– Intelligence: OpenAI GPT-4 for query understanding & summaries

πŸš€ Production Features

  • βœ… Async/await architecture for performance
  • βœ… SQLite caching with intelligent TTL
  • βœ… Rate limiting and exponential backoff
  • βœ… Comprehensive logging and error handling
  • βœ… Works offline with cached data
  • βœ… No API keys required for basic functionality
  • βœ… LLM-powered summaries for publication-ready output

πŸ“‹ Requirements

  • Python 3.8+
  • Dependencies: pip install -r requirements.txt
  • Optional: OpenAI API key for enhanced LLM features
  • Optional: NCBI API key for PubMed rate limits

πŸ§ͺ Real-World Use Cases

πŸ”¬ Academic Research

# Rare disease research
python cli.py "cystic fibrosis therapeutic targets"
python cli.py "Huntington's disease protein targets"

# Comparative analysis
python cli.py "CDK4, CDK6, CDK9 kinase inhibitor comparison"
python cli.py "EGFR, HER2, HER3 receptor family analysis"

# Structural biology
python cli.py "Find membrane protein targets with crystal structures"

🏭 Pharmaceutical Development

# Target assessment
python cli.py "Assess druggability of KRAS G12C mutant"
python cli.py "Find backup targets for failed Alzheimer's programs"

# Competitive intelligence
python cli.py "JAK family inhibitors clinical pipeline"
python cli.py "PD-1, PD-L1, CTLA-4 immunotherapy targets"

# Portfolio planning
python cli.py "Oncology targets with sub-10 nM inhibitors"

πŸ₯ Clinical Decision Support

# Resistance mechanisms
python cli.py "EGFR resistance mutation targets"
python cli.py "Find alternative targets for chemotherapy resistance"

# Combination therapy
python cli.py "DNA repair pathway targets for PARP combination"
python cli.py "Immunotherapy combination targets"

# Personalized medicine
python cli.py "BRCA1/2 deficient cancer targets"

πŸ’Š Drug Repurposing

# Find new indications
python cli.py "Antiviral targets with existing inhibitors"
python cli.py "Inflammation targets with FDA-approved drugs"

# Cross-disease analysis
python cli.py "Shared targets between diabetes and Alzheimer's"

🀝 Contributing

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open Pull Request

πŸ“„ License

MIT License - Open source for research and commercial use.

πŸ™ Citation

If you use Omics Oracle in your research, please cite:

@software{omics_oracle_2024,
  title={Omics Oracle: AI-Powered Therapeutic Target Discovery},
  author={Anugraha T},
  year={2024},
  url={https://github.com/anugrahat/omics-oracle-}
}

πŸ”‘ API Key Benefits Summary

Feature Without OpenAI With OpenAI Key
Query Understanding Basic regex 🧠 Natural language
Disease Mapping 12 curated diseases 🎯 Any disease
IC50 Filtering Manual parsing πŸ”¬ Automatic detection
Results Summary Basic tables πŸ“‹ Clinical insights
Query Types Limited πŸš€ Unlimited flexibility

Built for the drug discovery community

Advancing precision medicine through intelligent target discovery

🌟 Star this repo if it helps your research! 🌟

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published