A Siamese neural network implementation for calculating similarity between pairs of images using the MNIST dataset. This project is implemented as an interactive Jupyter notebook with comprehensive visualizations and step-by-step explanations.
The goal of this project is to train a neural network that can calculate the similarity between pairs of images. The network distinguishes whether two images belong to the same class or not, using the MNIST dataset as a benchmark.
This implementation uses a Siamese neural network architecture that learns to embed images into a feature space where:
- Similar images (same class) are close together
- Dissimilar images (different classes) are far apart
The network is trained using pairs of images with binary labels indicating whether the images are from the same class (similar=1) or different classes (dissimilar=0). Cosine similarity between embeddings serves as the similarity metric, and the network is trained with binary cross-entropy loss.
- Input: 28×28×1 MNIST images
- Architecture:
- Two convolutional blocks (Conv2d → BatchNorm → ReLU → MaxPool)
- Fully connected layers with BatchNorm and ReLU
- L2 normalization for unit-length embeddings
- Output: 128-dimensional normalized embedding vectors
- Takes two images as input
- Computes embeddings using the shared embedding network
- Calculates cosine similarity between embeddings
- Applies a learnable projection layer for binary classification
- Transform: Converts images to tensors with normalization (mean=0.1307, std=0.3081)
- Pair Generation: Custom
PairDataset
class creates pairs with 50% positive (same class) and 50% negative (different class) samples
- Optimizer: Adam (lr=1e-2, weight_decay=1e-5)
- Loss Function: Binary Cross-Entropy with Logits
- Scheduler: ReduceLROnPlateau (patience=2, factor=0.5)
- Batch Size: 256
- Epochs: 30
The implementation is organized into several key components:
- Data Loading and Preprocessing: MNIST dataset loading with normalization transforms
- PairDataset Class: Custom dataset for generating positive/negative image pairs
- EmbeddingNet Architecture: Convolutional neural network for feature extraction
- SimilarityModel: Siamese network combining embeddings with cosine similarity
- Training Setup: Loss function, optimizer, and evaluation utilities
- Training Loop: Main training process with validation and checkpointing
- Visualization: Results analysis and similarity visualization tools
torch
- PyTorch deep learning frameworktorchvision
- Computer vision datasets and transformsmatplotlib
- Plotting and visualizationnumpy
- Numerical computingpandas
- Data manipulation and analysisjupyter
- Jupyter notebook environment
-
Clone the repository:
git clone https://github.com/HasanAbdelhady/Calculating-Image-Similarity-using-Neural-Networks.git
-
Install dependencies:
pip install torch torchvision matplotlib numpy pandas jupyter
-
Run the Jupyter notebook:
jupyter notebook
Then open the main notebook file and run all cells sequentially. The notebook includes:
- Data loading and preprocessing
- Model architecture definitions
- Training loop with progress monitoring
- Results visualization and analysis
- Siamese Architecture: Shared weights between twin networks for consistent feature extraction
- Cosine Similarity: Uses normalized embeddings for robust similarity computation
- Balanced Dataset: Automatic generation of positive/negative pairs
- Mixed Precision: Optimized training with automatic mixed precision (when GPU available)
- Visualization: Comprehensive similarity visualization with color-coded results
Input (28×28×1) → Conv2d(1→32) → BatchNorm → ReLU → MaxPool(2×2) →
Conv2d(32→64) → BatchNorm → ReLU → MaxPool(2×2) →
Flatten → Linear(3136→128) → BatchNorm → ReLU → L2_Normalize → Output (128-dim)
- Pair Generation: Creates balanced pairs of similar/dissimilar images
- Forward Pass: Computes embeddings for both images in a pair
- Similarity Calculation: Uses cosine similarity between normalized embeddings
- Loss Computation: Binary cross-entropy loss for classification
- Optimization: Adam optimizer with learning rate scheduling
The trained model achieves:
- High accuracy in distinguishing between similar and dissimilar image pairs
- Meaningful similarity scores (higher for same-class pairs, lower for different-class pairs)
- Robust performance across different MNIST digit classes
Figure 1: Visualization of the PairDataset showing how similar and dissimilar image pairs are constructed from MNIST digits.
Figure 2: Structure of the paired dataset showing labels where 1 indicates similar pairs (same class) and 0 indicates dissimilar pairs (different classes).
Figure 3: 3D visualization of the EmbeddingNet architecture showing the convolutional blocks and fully connected layers.
Figure 4: Visual representation of the Similarity Model showing how cosine similarity is computed between image embeddings.
Figure 5: Example pairs of images with their predicted similarity scores from the trained model.
Figure 6: Additional examples showing comprehensive similarity comparisons between a test image and various digit classes.
- Similarity Matrix: Shows how a test image compares to all digit classes
- Color Coding: Green for high similarity, red for low similarity
- Ranked Results: Displays digits ranked by similarity score
- Probability Scores: Both raw cosine similarity and sigmoid probability
- Memory Optimization: Uses
pin_memory=True
andnon_blocking=True
for faster GPU transfers - Reproducibility: Includes proper random seeding and deterministic behavior options
- Efficiency: Implements efficient data loading with multiple workers and persistent workers
- Checkpointing: Automatically saves the best model as "best_model.pt" based on validation loss
After running the training cells in the notebook, you can use the model to compute similarities:
# The trained model is available in the notebook context
model.eval()
# Compute similarity between two images
with torch.no_grad():
similarity = model.get_raw_similarity(img1, img2)
print(f"Similarity score: {similarity.item():.4f}")
# Or get the projected similarity logit
logit = model(img1, img2)
probability = torch.sigmoid(logit)
print(f"Similarity probability: {probability.item():.4f}")
This project is available for educational and research purposes.
Feel free to submit issues, feature requests, and pull requests to improve the implementation.