Automated Classification and Cell Counting of Lung Cancer from Histopathological Images Using Machine Learning (LPAI)
LPAI integrates a convolutional neural network model in PyTorch for automatic classification and quantification of lung cancer cells in histopathological images. This project implements preprocessing for enhanced image analysis, includes functionalities for model training, performance evaluation, and provides visualization of predictions alongside cell counts.
Timesheet | Slack channel | Project report |
---|
After executing src/train.py
in a virtual environment, here are some example outputs:
The classification report below corresponds to the confusion matrix above, summarizing the model's performance on the same subset of 5000 images. It was generated using the scikit-learn library's classification_report
function, reflecting the precision, recall, f1-score, and support for each class based on the model's predictions within the virtual environment terminal:
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
lung_aca | 0.98 | 0.93 | 0.95 | 336 |
lung_n | 0.99 | 1.00 | 1.00 | 344 |
lung_scc | 0.94 | 0.97 | 0.96 | 320 |
accuracy | 0.97 | 1000 | ||
macro avg | 0.97 | 0.97 | 0.97 | 1000 |
weighted avg | 0.97 | 0.97 | 0.97 | 1000 |
Overall Accuracy: 0.97
The visualization below showcases the model's predictions compared to the actual classifications for four randomly selected histopathological images from the loaded data. Each image is accompanied by a label indicating the true class and the class predicted by the model, along with the estimated cell count detected within the image.
Training data from Kaggle can be found here.
After downloading the datasets, unzip, then move lung_image_sets into your local repository as below.
2024_1_project_06/
│
├── data/ # upload training data downloaded from Kaggle
│ └── lung_image_sets/
│ ├── lung_aca/
│ ├── lung_n/
│ └── lung_scc/
│
├── src/
│ ├── __init__.py # makes src a Python package
│ ├── train.py # training script
│ └── test/
│ └── test.py
│
├── LICENSE
├── README.md
└── requirements.txt
git clone https://github.com/sfu-cmpt340/2024_1_project_06.git
cd 2024_1_project_06
Create a virtual environment and activate it:
For Windows:
python -m venv env
env\Scripts\activate
For Unix systems (including Linux and macOS):
python -m venv env
source env/bin/activate
Install PyTorch by selecting the appropriate installation command from the PyTorch official Get Started page. The command you choose should correspond to your operating system, package manager (pip), Python version, and whether you need CUDA support.
For example, if you're installing PyTorch with CUDA 11.8 support:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
After PyTorch is installed, install the remaining project dependencies:
pip install -r requirements.txt
To start training the main script, run train.py
python src/train.py
To reproduce the results of the LPAI project after cloning and setting up your development environment:
- Set up your dataset and follow the installation process.
- Navigate to the
src
directory within your project repository. - To start training the model, run the
train.py
script:
python src/train.py
The script train.py is configured with predefined parameters for training. If you need to adjust settings such as the total_subset_size or the number of epochs, you will have to do so directly within the script.
For example, to change the total_subset_size, locate and modify the following line in train.py:
train_loader, test_loader, class_names = load_data(total_subset_size=400) # Adjust this number as needed
After making the necessary changes, save the script and execute it as shown above. The training process will begin using the new subset size you specified.