To download full 40GB dataset follow this:
- Download 'archive.zip' from https://www.kaggle.com/datasets/washingtongold/lidcidri30
- Unzip it into lidc directory (this could take a while)
unzip archive.zip -d lidc
Run prepare data script to convert the dataset into a format that can be used by the model.
Our prepare-data.py script uses pylidc which uses some old dependencies. With python 3.9 everything works fine. But for model training/inference you should use latest python version.
python3.9 -m venv venv
source venv/bin/activate
python3.9 -m pip install -r prepare-data-requirements.txt
python3.9 prepare-data.py
Or you can use already prepared zip of 3000 images to save time data.zip
.