This is a project to showcase results obtained for the Longevity hackathon
A Streamlit-based web application for biomedical drug research and discovery, integrating multiple pharmaceutical and biomedical databases for comprehensive drug analysis and exploration.
- Multi-Database Integration: Combines data from DrKG, DrugBank, MeSH, SIDER, DOID, and HGNC
- Interactive Web Interface: Built with Streamlit for intuitive data exploration
- Containerized Deployment: Docker-based setup for consistent environments
- Knowledge Graph Analysis: Leverages drug repurposing knowledge graphs
- Disease Ontology Integration: Incorporates structured disease classifications
- Docker and Docker Compose
- Make (optional, for convenience commands)
- Minimum 4GB RAM recommended
- At least 2GB free disk space for data and containers
The application integrates the following biomedical databases:
data/
├── doid/ # Disease Ontology
│ └── doid.obo # Disease classifications and relationships
├── drkg/ # Drug Repurposing Knowledge Graph
│ ├── drkg.tsv # Main knowledge graph data
│ ├── graph.gml # Graph structure file
│ ├── relation_glossary.tsv # Relationship definitions
│ └── relation_glossary.xlsx
├── drugbank/ # DrugBank Database
│ └── drugbank_vocabulary.csv # Drug nomenclature and metadata
├── hgnc/ # Human Gene Nomenclature Committee
│ └── HGNC_complete_set.tsv # Official gene symbols and names
├── mesh/ # Medical Subject Headings
│ └── desc2025.xml # Medical terminology hierarchy
├── sider/ # Side Effect Resource
│ └── meddra_all_indications.tsv # Drug side effects and indications
├── drugbank_vocabulary.csv # Additional drug vocabulary
└── entity_name_mapping.json # Entity name mappings across databases
- DOID: Disease Ontology - Standardized disease classifications
- DrKG: Drug Repurposing Knowledge Graph - Comprehensive biomedical knowledge graph
- DrugBank: DrugBank Database - Pharmaceutical knowledge base
- HGNC: HUGO Gene Nomenclature Committee - Official gene symbols
- MeSH: Medical Subject Headings - NLM's controlled vocabulary
- SIDER: Side Effect Resource - Drug side effects database
Place the following files in the access/
directory to enable Google Drive access:
access/service_account.json
.
Instructions on creating a Google Drive client and connecting it to the code can be found here.
# Clone the repository
git clone <repository-url>
cd <project-directory>
# Build and start all services
make
Don't forget to place Google Drive client JSON file.
# Set environment variables
export UID=$(id -u)
export GID=$(id -g)
# Build and start
./run.sh
Once the containers are running, access the Streamlit application at:
http://localhost:8501
make help # Show all available commands
make build # Build all containers
make up # Start containers in detached mode
make down # Stop all containers
make logs # View container logs
make ps # Show container status
make restart # Restart services
make clean # Clean volumes (⚠️ removes data)
# Access container shell
make shell SERVICE=streamlit_app
# View logs in real-time
make logs
# Restart specific service
make restart SERVICE=streamlit_app
- Streamlit App: Port 8501 (configurable in docker-compose.yml)
./streamlit_app
→/app/streamlit_app
(Application code)./data
→/app/data
(Database files)./access
→/app/access
(Access control files)
.
├── docker-compose.yml # Docker services configuration
├── Dockerfile # Container build instructions
├── Makefile # Development convenience commands
├── requirements.txt # Python dependencies
├── project_config.py # Project configuration
├── streamlit_app/ # Streamlit application code
│ └── root_page.py # Main application entry point
├── app/ # Core application modules
├── research_scripts/ # Research and analysis scripts
├── data/ # Database files (see structure above)
└── access/ # Access control and authentication
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes
- Test with:
make build && make up
- Submit a pull request
This project is licensed under the MIT License.
Maintainers: mikhail.solovyanov@gmail.com