Skip to content

LPAI integrates a convolutional neural network model in PyTorch for automatic classification and quantification of lung cancer cells in histopathological images.

License

Notifications You must be signed in to change notification settings

sfu-cmpt340/2024_1_project_06

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Classification and Cell Counting of Lung Cancer from Histopathological Images Using Machine Learning (LPAI)

LPAI integrates a convolutional neural network model in PyTorch for automatic classification and quantification of lung cancer cells in histopathological images. This project implements preprocessing for enhanced image analysis, includes functionalities for model training, performance evaluation, and provides visualization of predictions alongside cell counts.

Important Links

Timesheet Slack channel Project report

Video/demo/GIF

CMPT340 Lung Histopathology Classification

Table of Contents

  1. Demo

  2. Dataset Setup

  3. Installation

  4. Reproducing this project

1. Demo - Example output

After executing src/train.py in a virtual environment, here are some example outputs:

Confusion Matrix on Max Subset Size of 5000 Images

Confusion Matrix

Classification Report for the Same Subset

The classification report below corresponds to the confusion matrix above, summarizing the model's performance on the same subset of 5000 images. It was generated using the scikit-learn library's classification_report function, reflecting the precision, recall, f1-score, and support for each class based on the model's predictions within the virtual environment terminal:

Class Precision Recall F1-score Support
lung_aca 0.98 0.93 0.95 336
lung_n 0.99 1.00 1.00 344
lung_scc 0.94 0.97 0.96 320
accuracy 0.97 1000
macro avg 0.97 0.97 0.97 1000
weighted avg 0.97 0.97 0.97 1000

Overall Accuracy: 0.97

Cell Detection and Localization in Histopathological Images

Cell Counting

Predicted vs. Actual Classification Visualization

The visualization below showcases the model's predictions compared to the actual classifications for four randomly selected histopathological images from the loaded data. Each image is accompanied by a label indicating the true class and the class predicted by the model, along with the estimated cell count detected within the image.

Confusion Matrix

2. Dataset setup

Training data from Kaggle can be found here.

After downloading the datasets, unzip, then move lung_image_sets into your local repository as below.

2024_1_project_06/
│
├── data/ # upload training data downloaded from Kaggle
│   └── lung_image_sets/
│       ├── lung_aca/
│       ├── lung_n/
│       └── lung_scc/
│
├── src/
│   ├── __init__.py # makes src a Python package
│   ├── train.py # training script
│   └── test/ 
│       └── test.py
│
├── LICENSE
├── README.md
└── requirements.txt

2. Installation

git clone https://github.com/sfu-cmpt340/2024_1_project_06.git
cd 2024_1_project_06

Create a virtual environment and activate it:

For Windows:

python -m venv env
env\Scripts\activate

For Unix systems (including Linux and macOS):

python -m venv env
source env/bin/activate

Install PyTorch by selecting the appropriate installation command from the PyTorch official Get Started page. The command you choose should correspond to your operating system, package manager (pip), Python version, and whether you need CUDA support.

For example, if you're installing PyTorch with CUDA 11.8 support:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

After PyTorch is installed, install the remaining project dependencies:

pip install -r requirements.txt

To start training the main script, run train.py

python src/train.py

3. Reproduction

To reproduce the results of the LPAI project after cloning and setting up your development environment:

  1. Set up your dataset and follow the installation process.
  2. Navigate to the src directory within your project repository.
  3. To start training the model, run the train.py script:
python src/train.py

The script train.py is configured with predefined parameters for training. If you need to adjust settings such as the total_subset_size or the number of epochs, you will have to do so directly within the script.

For example, to change the total_subset_size, locate and modify the following line in train.py:

train_loader, test_loader, class_names = load_data(total_subset_size=400)  # Adjust this number as needed

After making the necessary changes, save the script and execute it as shown above. The training process will begin using the new subset size you specified.

About

LPAI integrates a convolutional neural network model in PyTorch for automatic classification and quantification of lung cancer cells in histopathological images.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages