小分子溶解度LogS性质预测,一个机器学习入门项目演示。包括内容:
- 探索性数据分析EDA
- PCA/t-SNE降维与聚类
- 小分子基本物化性质、分子描述符、分子指纹特征构造
- 数据集划分
- 回归模型指标计算:PCC, R2, SMAPE, MAE, RMSE, ...
- 线性回归模型训练及评估
- CatBoost决策树模型训练及评估
- 模型可解释性
A rudimentary machine learning tutorial project for small molecule LogS property prediction. Illustrating,
- EDA, exploratory data analysis
- PCA/t-SNE dimension reduction and clustering
- Get basic properties/descriptors/fingerprints of samll molecules using rdkit
- Dataset split
- Evaluation metrics for regression task
- Linear regression model training and evaluation
- CatBoost based decision tree model training and evaluation
- Model interpretability
Any suggestion, please contact 328792@qq.com.