This project implements semantic segmentation using a MobileNet encoder with UNet decoder for 7-class image segmentation.
- 0: sky
- 1: water
- 2: bridge
- 3: obstacle
- 4: living_obstacle
- 5: background
- 6: self
- MobileNet Encoder: Pretrained MobileNetV2 for feature extraction
- UNet Decoder: U-Net architecture for pixel-wise classification
- GPU-Optimized: Requires CUDA GPU for optimal performance
- Comprehensive Metrics: Pixel accuracy and mean IoU tracking
- Visualization: Training curves, prediction visualization, confusion matrix
- Logging: Detailed logging throughout training
- Model Checkpointing: Saves the best model based on validation IoU
- Install dependencies:
pip install -r requirements.txt
- Ensure your dataset structure is:
├── images/ # Input images (.png files)
├── segmentations/ # Segmentation masks (.png files)
└── model.py # Training script
Run the training script:
python model.py
- Data Split: 80% training, 20% validation
- Epochs: 50 epochs with early stopping based on validation IoU
- Batch Size: 8 (adjustable)
- Learning Rate: 0.001 with StepLR scheduler
- Image Size: 224x224 pixels
- Augmentation: Standard normalization for ImageNet pretrained models
best_model.pth
: Best model checkpointtraining_curves.png
: Loss, accuracy, and IoU curvespredictions_visualization.png
: Sample predictionsconfusion_matrix.png
: Confusion matrix heatmap
- Encoder: MobileNetV2 with multi-scale feature extraction
- Decoder: UNet-style decoder with skip connections
- Output: 7-class segmentation map
- Pixel Accuracy: Percentage of correctly classified pixels
- Mean IoU: Average Intersection over Union across all classes
- Per-class IoU: Individual IoU for each semantic class
After training, you can use the saved model for inference on new images in several ways:
python simple_inference_example.py
This runs inference on a sample image and shows the basic usage pattern.
Single Image:
python inference.py --image_path path/to/your/image.png
Batch Processing:
python inference.py --input_folder path/to/image/folder
Demo with Training Data:
python inference.py # Uses first image from training data
from inference import load_model, preprocess_image, predict_segmentation
# Load trained model
model = load_model('best_model.pth')
# Process an image
image_tensor, original_size = preprocess_image('your_image.png')
prediction = predict_segmentation(model, image_tensor)
best_model.pth
: The trained model checkpoint (created during training)inference_results/
: Directory containing inference outputs*_result.png
: Visualization of predictions*_prediction.npy
: Raw prediction arrays
The model is saved locally as best_model.pth
after training. To use it elsewhere:
- Copy the
best_model.pth
file - Copy the
model.py
file (contains model architecture) - Copy the
inference.py
file (contains inference functions) - Install the same dependencies (
requirements.txt
)
- Model size: ~2.3MB (MobileNet-based, very efficient)
- Input size: 224x224 pixels
- Output: Full-resolution segmentation maps
- Classes: 7 semantic classes (sky, water, bridge, obstacle, living_obstacle, background, self)