Skip to content

aravindhnivas/cxml_py

Repository files navigation

ChemXploreML Python Backend

This repository contains the Python backend for the ChemXploreML desktop application, which implements the machine learning framework described in the paper: Machine Learning Pipeline for Molecular Property Prediction Using ChemXploreML.

Please visit the Documentation to download the desktop application. To access the desktop application source code, please visit the ChemXploreML repository.

Overview

ChemXploreML is a powerful machine learning framework designed for chemical space exploration and molecular property prediction. This Python backend provides the core functionality for:

  • Molecular feature generation and representation
  • Machine learning model training and evaluation
  • Chemical space visualization
  • Property prediction and uncertainty estimation
  • Model interpretation and explainability

Features

  • Advanced ML Algorithms: Support for XGBoost, LightGBM, CatBoost, and scikit-learn models
  • Chemical Space Analysis: Integration with PCA, UMAP, t-SNE, KernelPCA, PHATE, ISOMAP, LaplacianEigenmaps, TriMap and FactorAnalysis for dimensionality reduction
  • Model Optimization: Hyperparameter tuning with Optuna
  • Task Queue: Asynchronous processing with Redis and RQ
  • Data Quality: Integration with CleanLab for data quality assessment
  • Deep Learning: Support for transformer-based models and custom neural networks (soon to be added)

Requirements

  • Rye package manager

Installation

  1. Clone the repository:
git clone https://github.com/aravindhnivas/cxml_py.git
cd cxml_py
  1. Ensure you have Rye installed and create and activate a virtual environment:
rye sync

# for unix/macOS
source .venv/bin/activate

# or for windows
.venv\Scripts\activate

Usage

  • Start the desktop application ChemXploreML.
  • Navigate to the 'Settings' tab to start the server.

Project Structure

cxml_py/
├── src/
│   └── cxml_lib/        # Core library code
├── pyproject.toml      # Project configuration
├── requirements.lock   # Locked dependencies
└── README.md           # This file

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this software in your research, please cite:

Marimuthu, A. N.; McGuire, B. A. Machine Learning Pipeline for Molecular Property Prediction Using ChemXploreML. J. Chem. Inf. Model. 2025. https://doi.org/10.1021/acs.jcim.5c00516.

Support

For support, please open an issue in the GitHub repository or contact aravindhnivas28@gmail.com.

Acknowledgments

I would like to thank the authors and maintainers of the following libraries for their invaluable contributions:

Core Scientific Computing

  • NumPy - Array computing and linear algebra
  • SciPy - Scientific computing and optimization
  • Pandas - Data manipulation and analysis
  • Dask - Parallel computing and task scheduling

Machine Learning

  • Scikit-learn - Machine learning algorithms
  • XGBoost - Gradient boosting framework
  • LightGBM - Light gradient boosting machine
  • CatBoost - Gradient boosting on decision trees
  • Optuna - Hyperparameter optimization
  • SHAP - Model interpretability
  • CleanLab - Data quality and label error detection

Deep Learning

Chemical Informatics

  • RDKit - Cheminformatics and machine learning
  • SELFIES - String-based molecular representation

Visualization

  • Matplotlib - Plotting library
  • Seaborn - Statistical data visualization
  • PHATE - Dimensionality reduction
  • UMAP - Uniform Manifold Approximation
  • TriMap - Dimensionality reduction

Web and API

Development Tools

About

Python backend for ChemXploreML desktop application

Topics

Resources

License

Stars

Watchers

Forks

Languages