HIV Drug Prediction Model

A machine learning model to predict which chemical compounds can fight HIV effectively, helping researchers focus on the most promising candidates.

Overview

This project uses QSAR (Quantitative Structure-Activity Relationship) modeling to predict HIV drug compound efficacy. It helps pharmaceutical researchers identify promising compounds before expensive lab testing.

Key Benefits:

Reduce screening time from months to hours
Focus resources on high-probability compounds
Improve success rates in drug discovery

Features

Complete ML pipeline from data preprocessing to deployment
REST API for real-time predictions
Model monitoring and performance tracking
Batch processing for large compound libraries
Docker containerization for easy deployment

Dataset

Source: NCI AIDS Antiviral Screen Data

40,000+ HIV-tested compounds
Activity classes: CA (Active), CM (Moderately Active), CI (Inactive)
EC50/IC50 measurements
Molecular structure data

Technology Stack

Language: Python 3.8+
ML Libraries: scikit-learn, XGBoost, RDKit
API: FastAPI
Database: PostgreSQL
MLOps: MLflow
Deployment: Docker

Model Performance

Metric	Value
Accuracy	87.3%
F1-Score	0.84
Cohen's Kappa	0.79
AUC-ROC	0.91

Quick Start

Installation

# Clone repository
git clone https://github.com/pari1jay/Prediction-Model-HC.git
cd Prediction-Model-HC

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

Running the Application

# Development mode
python src/main.py

# Production with Docker
docker-compose up -d

API Usage

import requests

# Predict compound activity
response = requests.post(
    "http://localhost:8000/predict",
    json={"smiles": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"}
)
print(response.json())

Usage Examples

Training a Model

from src.training.pipeline import TrainingPipeline

pipeline = TrainingPipeline()
pipeline.load_data("data/hiv_compounds.csv")
pipeline.preprocess()
pipeline.train()
pipeline.evaluate()

Making Predictions

from src.prediction.predictor import CompoundPredictor

predictor = CompoundPredictor.load("models/best_model.pkl")
result = predictor.predict_smiles("CCO")
print(f"Activity: {result['activity']}, Confidence: {result['confidence']:.3f}")

Batch Processing

python scripts/batch_predict.py --input compounds.csv --output predictions.csv

Project Structure

Prediction-Model-HC/
├── src/
│   ├── data/           # Data processing
│   ├── features/       # Feature engineering
│   ├── models/         # ML models
│   ├── training/       # Training pipeline
│   ├── prediction/     # Prediction service
│   └── api/           # REST API
├── data/              # Datasets
├── models/            # Trained models
├── scripts/           # Utility scripts
├── tests/             # Test files
└── docs/              # Documentation

Data Processing Pipeline

Data Integration: Merge screening results, EC50/IC50 values, and molecular structures
Quality Control: Handle duplicates, conflicts, and missing data
Feature Engineering: Calculate molecular descriptors and fingerprints
Model Training: Train and validate multiple ML models
Evaluation: Assess model performance using relevant metrics

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/new-feature)
Commit your changes (git commit -m 'Add new feature')
Push to the branch (git push origin feature/new-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

National Cancer Institute for the AIDS Antiviral Screen Data
RDKit community for cheminformatics tools
Open source contributors

Contact

Pari Jay - GitHub

Project Link: https://github.com/pari1jay/Prediction-Model-HC

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
models		models
src		src
Drug Potency Analysis.ipynb		Drug Potency Analysis.ipynb
Drug_efficacy_pred_model.ipynb		Drug_efficacy_pred_model.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HIV Drug Prediction Model

Overview

Features

Dataset

Technology Stack

Model Performance

Quick Start

Installation

Running the Application

API Usage

Usage Examples

Training a Model

Making Predictions

Batch Processing

Project Structure

Data Processing Pipeline

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

pari1jay/Prediction-Model-HC

Folders and files

Latest commit

History

Repository files navigation

HIV Drug Prediction Model

Overview

Features

Dataset

Technology Stack

Model Performance

Quick Start

Installation

Running the Application

API Usage

Usage Examples

Training a Model

Making Predictions

Batch Processing

Project Structure

Data Processing Pipeline

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages