Drug-Binding-Protein

A ML model build base on the dataset provided by AlphaFold about protein 3D structure to determine which part of the protein is able to bind to pharmaceutical drugs

In the Notebook, I have compared multiple models such as XGBoost, LightBGM and K-Nearest. Since this is an extremely unbalanced dataset, classifying the as much true positive is more important so false positive > false negative is preferred.

In this unbalanced datatse, there are multiple ways to solve this. After a lot of testing base on the accuracy report in sklearn (F1 score and Precision), using class weights is better than using Oversampling or Undersampling method like SMOTE, SVMSMOTE, and NearMiss. With a F1 score of positive class of 37% while negative class of 98%. The ROC is less important in the usage of class weights because the datatset is imbalanced.

After all the training and evaluation of model's performance. There is another datatset without labels and will let the model predict their compatibility with drug binding.

Dataset:
https://drive.google.com/file/d/1H6oqtp9buAjO8NKQEW_jDzRd-4-qgQPF/view?usp=sharing
https://drive.google.com/file/d/1pr2_xiH7gEOnPtg8yqSevZDF_l0ak387/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Protein_binding.ipynb		Protein_binding.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drug-Binding-Protein

About

Uh oh!

Releases

Packages

Languages

w12l3-c/Drug-Binding-Protein-Prediction

Folders and files

Latest commit

History

Repository files navigation

Drug-Binding-Protein

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages