Quick start. • Model comparison • Slow start. • Contributions • Known issues • Contact info
-
Set up the code for this project.
- Download the source code ZIP file. or the source code TAR file. and extract it.
- Set up a Python environment, if you don't already have one. I used PyCharm, ver. 2023.2.4, Community Edition.
- Install dependencies as needed. I used Python 3.8, TensorFlow 2.10.1, and Keras 2.10.0.
-
Get the data from Kaggle.
- Download OpenCrystalData Crystal Impurity Detection and extract it.
- Edit
GAsplitDataIntoTrainValidandTest.py
so that the folder_prefix variable points to the data on your pc. - Run
GAsplitDataIntoTrainValidandTest.py
to split the data up into training, validation, and testing.
-
Run the training.
- Edit
GAmain.py
so that the folder_prefix variable points to the OpenCrystalData on your pc. - Run
GAmain.py
to train the model. - Check the results in the \GAdeliverables folder (where you extracted the data).
- Edit
-
Play time.
- Download the HDF5 weights file or the ONNX weights file to the \inference folder.
- Edit
GAinference.py
so that the folder_prefix variable points to the Georgia Project code. - Run
GAinference.py
to label images in the \inference folder and plot images in the \kmeans folder.
In the table below are the details offered by the published paper, then on the right are the choices that I elected to work with.
Salami et al. paper | my work | |
---|---|---|
framework | MATLAB | PyCharm |
model type | ResNet-18, ResNet-50 | ResNet-101 |
optimization method | SGDM | Keras SGD (momentum .9) |
learning rate | 1 × 10−4 | 1 × 10−1 |
training data | 3200−3600 in each class | (same) |
train/val./test % | 70/25/5% | (same) |
minibatch size | 32−64 | 64 |
validation frequency | 10−50 | 1 |
added dropout layers | (did not comment) | 2 |
trainable ImageNet layers | (did not comment) | made last 10% trainable |
The scientists who wrote the paper trained ResNet models with ImageNet weights on the OpenCrystalData dataset. The models were trained to do binary classification of images of crystals, designating them as either CEX (a.k.a., “cephalexin antibiotic,” a good thing) or PG (a.k.a. “phenylglycine,” a bad thing). This project recreates their work.
Here is the research paper. Salami, H., McDonald, M. A., Bommarius, A. S., Rousseau, R. W., & Grover, M. A. (2021). In Situ Imaging Combined with Deep Learning for Crystallization Process Monitoring: Application to Cephalexin Production. Organic Process Research & Development, 25(7), 1670–1679.
Here is the Georgia Project's detailed documentation.
Go to the main doc file
If you found an issue or would like to make a suggestion for an improvement to the code or documentation, please click on the issue tab on the project page and leave me a note. If you like this project, leave a star.
None.
For more details about this project, feel free to reach out to me at katherinemossdeveloper@gmail.com or my account on LinkedIn .
My time zone is EST in the U.S.