This repo contains the work of Blake Curtsinger, Dennis Ross, Ronald Collins, and Tanner Cook for their final data mining project. The project is a former Kaggle competition that involves creating a model to classify images of dogs and cats.
You can find the competition here: https://www.kaggle.com/competitions/dogs-vs-cats/overview
Our team decided to use Python for this project given that we lack experience doing data science projects in Python. Specifically, we created a Jupyter Notebook for all of our code as that generally allows for a better presentation.
At the start of this project, we had NO experience working with images in Python, pre-processing images in general, and utilizing tensorflow. Therefore, our work takes significant inspirations from Gabriel Atkin's YouTube content.
You can find Gabriel's channel here: https://www.youtube.com/@gcdatkin
Due to technical limitations and time constraints, we decided to downsize significantly from about 25,000 images to just 1,000. We made sure to have balanced classes and images removed at random when implementing this change.
This was an excellent introduction to tensorflow and working with images. Considerable efforts were made to explain most of the code, so we hope that this can be of some use to others who are interested in this subject.
- Our notebook assumes that you are in this repo as your working directory
- You will need to extract the .zip file from Kaggle in your repo folder, then add a new folder within that ouput called "images"