This project develops a real-time handwritten digit recognition system using a Convolutional Neural Network (CNN) and deploys it as a web application.
The project is structured into three main phases:
Objective: To load, preprocess, and understand the MNIST dataset.
Activities:
- Loading the MNIST dataset (handwritten digits 0-9).
- Normalizing pixel values to a [0, 1] range.
- Reshaping the image data to include a channel dimension, suitable for CNN input.
- Converting labels to one-hot encoding.
- Visualizing sample images and the distribution of digits in the dataset.
- Saving the processed data (
X_train.npy
,y_train.npy
,X_test.npy
,y_test.npy
) for subsequent phases.
Key Files: mnist_data_prep.py
Objective: To design, train, and evaluate a Convolutional Neural Network for digit recognition.
Activities:
- Designing a CNN architecture with multiple convolutional and pooling layers, followed by dense layers.
- Compiling the model with an appropriate optimizer (Adam) and loss function (categorical cross-entropy).
- Training the model on the prepared MNIST training data.
- Implementing callbacks such as Early Stopping (to prevent overfitting) and Model Checkpoint (to save the best performing model).
- Evaluating the trained model's performance on the test set.
- Visualizing training history (accuracy and loss plots) and generating a confusion matrix to assess classification performance.
- Saving the trained model (
mnist_cnn_model.h5
).
Key Files: mnist_model.py
Objective: To create an interactive web application for real-time digit recognition.
Activities:
- Loading the pre-trained CNN model.
- Developing a web interface using Gradio, allowing users to draw digits on a canvas.
- Implementing a preprocessing function to transform the drawn image into the format expected by the model.
- Integrating the model's prediction logic into the web app to provide real-time predictions, confidence scores, and probability distributions.
- Providing example images for quick testing.
Key Files: app.py
- Python: The primary programming language for the entire project.
- TensorFlow/Keras: For building, training, and evaluating the Convolutional Neural Network models.
- NumPy: For numerical operations and efficient handling of array data.
- Matplotlib: For data visualization, including plotting training history, sample images, and confusion matrices.
- Scikit-learn: For data splitting (e.g.,
train_test_split
) and evaluation metrics (e.g.,confusion_matrix
,classification_report
). - Streamlit: For rapidly building and deploying the interactive web application.
- OpenCV (cv2): (Potentially) For image processing tasks within the web application, such as resizing and color conversion.
- Clone the repository (if applicable).
- Navigate to the project directory:
cd D:\digit-recognizer
- Install dependencies (ensure you have
pip
installed):pip install tensorflow numpy matplotlib scikit-learn streamlit opencv-python
- Run Phase 1 (Data Preparation):
This will generate
python mnist_data_prep.py
X_train.npy
,y_train.npy
,X_test.npy
,y_test.npy
. - Run Phase 2 (Model Training):
This will train the initial model and save
python mnist_model.py
mnist_cnn_model.h5
. - Run Phase 3 (Web Application):
This will launch the Streamlit web interface in your browser.
streamlit run app.py
Feel free to explore each script and modify them as needed for further experimentation and improvement.
