Deep Learning for Molecular and Genomic Characterization of Lung Cancer in Never-Smokers Using Hematoxylin and Eosin-Stained Images
Mutation_AI is a deep learning project for mutation analysis using Convolutional Neural Networks (CNNs) with two main modules:
Multilabel_CNN
: For multi-label classification tasks.Binary_CNN
: For binary classification tasks.
Both modules implement a custom ResNet-50-like architecture using TensorFlow/Keras.
- Installation
- Project Structure
- Usage
- Data Structure
- Model Architecture
- Requirements
- Contributing
- License
-
Clone the repository
git clone https://github.com/monjoybme/Mutation_AI.git cd Mutation_AI
-
Install dependencies
It is recommended to use a Python virtual environment.
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
Mutation_AI/
│
├── Multilabel_CNN/
│ ├── model.py # Custom ResNet-50 model for multilabel classification
│ ├── main.py # Training/Inference script (if available)
│ └── ... # Additional utilities and scripts
│
├── Binary_CNN/
│ ├── model.py # Custom ResNet-50 model for binary classification
│ ├── main.py # Training/Inference script (if available)
│ └── ... # Additional utilities and scripts
│
├── requirements.txt
└── README.md
- Prepare your dataset as described in the Data Structure section.
- Configure parameters in
Multilabel_CNN/main.py
(if available). - Run the training script:
python Multilabel_CNN/main.py
- Prepare your dataset as described in the Data Structure section.
- Configure parameters in
Binary_CNN/main.py
(if available). - Run the training script:
python Binary_CNN/main.py
-
The models expect input data in a format compatible with TensorFlow/Keras.
-
Typical directory structure for image data:
data/ train/ class_1/ img001.png img002.png ... class_2/ ... val/ class_1/ class_2/
-
For multilabel tasks, a CSV file with file paths and corresponding label vectors is often used.
-
For binary tasks, two folders (e.g.,
positive/
andnegative/
) or a similar structure.
Note: Adjust data loading utilities as needed for your specific data organization.
Both Multilabel_CNN/model.py
and Binary_CNN/model.py
implement a custom ResNet-50-like architecture using TensorFlow/Keras:
- Initial convolutional and pooling layers
- Multiple custom residual blocks (with optional shortcuts)
- Global Average Pooling
- Dense layers with dropout
- Output layer:
sigmoid
activation for multilabel and binary classification
The output layer's size is determined by the number of classes (multi-label) or 1 (binary).
See requirements.txt
for full details.
Main dependencies:
- tensorflow>=2.0
- numpy
- pandas
- scikit-learn
- matplotlib (optional, for plotting)
- tqdm (optional, for progress bars)
Contributions are welcome! Please open issues or pull requests for improvements or bug fixes.
This project is licensed under the MIT License.