This project demonstrates how to use MobileNetV2 as a feature extractor for the Fashion MNIST dataset, combining transfer learning with a custom classification head.
It also showcases the use of Keras callbacks — including EarlyStopping, ModelCheckpoint, and TensorBoard — to optimise training and track performance.
The model’s best hyperparameters (dense units, dropout, learning rate, optimiser) were sourced from prior hyperparameter tuning using Random Search, which can be found here: cnn-keras-tuning-fashion-mnist.
- Uses MobileNetV2 pretrained on ImageNet for feature extraction
- Adds a custom dense layer & dropout for Fashion MNIST classification
- Implements key Keras callbacks:
- EarlyStopping → stops training when validation loss stops improving
- ModelCheckpoint → saves the best-performing model automatically
- TensorBoard → visualises training curves, histograms, and more
- Evaluates model performance across multiple independent runs to assess stability
cnn-callbacks-fashion-mnist/
│
├── callbacks_fashion_mnist.py # Main training script
├── saved_models/ # Best model saved as .keras
├── logs/ # TensorBoard logs
├── requirements.txt
├── .gitignore
└── README.md
1️⃣ Clone the repo
git clone https://github.com/adabyt/cnn-callbacks-fashion-mnist.git
cd cnn-callbacks-fashion-mnist
2️⃣ Install dependencies
pip install -r requirements.txt
3️⃣ Run the script
python callbacks_fashion_mnist.py
4️⃣ Launch TensorBoard (optional)
tensorboard --logdir logs/fit
Then open http://localhost:6006 to visualise training.
The final model structure:
- MobileNetV2 (frozen) → feature extraction
- Global Average Pooling
- Dense (480 units, ReLU)
- Dropout (0.3)
- Dense (10 units, Softmax)
Total trainable parameters: ~619k (MobileNetV2 frozen)
The model was trained multiple times to evaluate run-to-run consistency:
Run | Test Accuracy | Test Loss | Epochs Used |
---|---|---|---|
1 | 0.9079 | 0.2543 | 16 |
2 | 0.9076 | 0.2590 | 17 |
3 | 0.9056 | 0.2559 | 17 |
4 | 0.9080 | 0.2534 | 17 |
5 | 0.9065 | 0.2555 | 17 |
6 | 0.9052 | 0.2583 | 14 |
Overall Performance:
`0.9068 ± 0.0011` after 6 runs.
- Shuffle order – Different batch orders change weight updates.
- Weight initialisation – Initial weights differ unless a seed is fixed.
- Dropout randomness – Different neurons drop each epoch.
- Adam optimiser’s internal state – Tracks moving averages of gradients, which evolve differently per run.
- GPU non-determinism (less likely for smaller models) – Minor floating-point differences can accumulate.
The logs suggest mild overfitting: training accuracy improved faster than validation accuracy.
Ways to mitigate overfitting:
- Add more data
- Use data augmentation
- Apply regularization (L1/L2)
- Increase dropout
- Rely on early stopping
- Add batch normalisation
- Simplify model architecture
- Tune hyperparameters further
This project shows how:
- Transfer learning with MobileNetV2 can effectively classify Fashion MNIST with minimal training time.
- Callbacks like EarlyStopping and ModelCheckpoint prevent overtraining and save the best model automatically.
- TensorBoard provides invaluable visualisation for debugging and performance tracking.
Final test accuracy: ~90.7%.