Cytopathic Effect Prediction App

Project Overview

This project is a machine learning application built to predict the cytopathic effect based on its SMILES representation. The machine learning model used is a RandomForestClassifier, which has been trained on molecular fingerprint data derived from SMILES strings. The project includes both model training and a user-friendly Streamlit app to make predictions on new molecules and visualize the model's performance metrics.

For more details, visit the GitHub Repository.

Technologies Used

RDKit: To convert molecular SMILES strings into fingerprints.
Scikit-learn: To train the Random Forest model and evaluate performance metrics.
Imbalanced-learn: To balance the dataset using SMOTE (Synthetic Minority Over-sampling Technique).
Joblib: To save and load the model and scaler.
Streamlit: To build an interactive web app for predictions and model performance visualization.

Model Training Process

Data Preparation:
- The SMILES strings are converted into molecular fingerprints using RDKit.
- The dataset is balanced using SMOTE to handle any class imbalance.
- The data is split into training and test sets.
- The training data is scaled using StandardScaler.
Model Training:
- A RandomForestClassifier is trained on the scaled training data.
- The model and the scaler are saved using Joblib for future use.
Model Evaluation:
- The model's accuracy is calculated on the test set.
- A classification report is generated to summarize performance metrics such as precision, recall, and F1-score.

Streamlit App Features

The Streamlit app provides the following functionalities:

1. Single Molecule Prediction:

Input: A SMILES string representing a molecule (e.g., "CCO" for ethanol).
Output: The prediction of whether the molecule has a cytopathic effect (active or inactive) and the probability score of the prediction.

2. Model Performance Metrics:

The app visualizes key performance metrics based on test data. This includes:

Confusion Matrix: A matrix displaying the number of true positive, true negative, false positive, and false negative predictions.
ROC Curve: A graph showing the trade-off between sensitivity (true positive rate) and specificity (false positive rate). The ROC-AUC score is also displayed.
Prediction Distribution: A bar chart showing the distribution of predicted classes (active/inactive).

Input and Output

Input

Single Molecule Prediction: The app requires a SMILES string as input in the text box. Example:
- C1=CC=CC=C1 (Benzene)
- CCO (Ethanol)

Output

The output of the app includes:
- A prediction of whether the molecule has a cytopathic effect (active or inactive).
- The probability associated with the prediction (e.g., 0.75 for a 75% chance that the molecule is active).
- Visualizations such as confusion matrix, ROC curve, and prediction distribution.

How to Run

Install dependencies: Run the following commands to install the necessary packages:
```
pip install rdkit-pypi scikit-learn joblib imbalanced-learn streamlit
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ReadMe.md		ReadMe.md
app.py		app.py
cytopathic-effect-prediction.ipynb		cytopathic-effect-prediction.ipynb
cytopathic_effect_rf_model.pkl		cytopathic_effect_rf_model.pkl
requirements.txt		requirements.txt
scaler.pkl		scaler.pkl
streamlit-screenshot.png		streamlit-screenshot.png
training-set.csv		training-set.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cytopathic Effect Prediction App

Project Overview

Technologies Used

Model Training Process

Streamlit App Features

1. Single Molecule Prediction:

2. Model Performance Metrics:

Input and Output

Input

Output

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Languages

suhaasteja/Cytopathic-Effect-Prediction

Folders and files

Latest commit

History

Repository files navigation

Cytopathic Effect Prediction App

Project Overview

Technologies Used

Model Training Process

Streamlit App Features

1. Single Molecule Prediction:

2. Model Performance Metrics:

Input and Output

Input

Output

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages