This repository contains code and utilities for the Synthetic2Real Object Detection Challenge 2 on Kaggle. The goal is to train a model that predicts bounding box coordinates for objects in images, using only synthetic data.
_models.py
main.ipynb
view.ipynb
checkpoint.pt
sample_submission.csv
training_progress.png
train/
val/
testImages/
annotated_real_data/
-
_models.py: Core model, dataset, training, evaluation, and visualization code.
-
main.ipynb: Notebook for training and evaluating the model.
-
view.ipynb: Notebook for visualizing predictions on images.
-
checkpoint.pt: Saved model weights.
-
sample_submission.csv: Example submission file for Kaggle.
-
training_progress.png: Training/validation loss plot.
-
train/, val/: Each contains images (
.jpg
or.png
) and corresponding label files (e.g.,.txt
or.csv
) for each image.
Example:train/ base/ images/ img_001.jpg img_002.jpg ... labels/ img_001.txt img_002.txt ... cameraDistance/ images/ img_001.jpg img_002.jpg ... labels/ img_001.txt img_002.txt ... ...
The
val/
andtestImages/
folder follow the same structure astrain/base/
. -
(Optional) annotated_real_data/: Additional real annotated data.
-
Install dependencies
This project requires Python 3.12+ and the following packages:- torch
- torchvision
- matplotlib
- numpy
- tqdm
- pillow
Install with pip:
pip install torch torchvision matplotlib numpy tqdm pillow
-
Download and organize data
Download the competition data from Kaggle and extract it into the appropriate folders (train/
,val/
,testImages/
, etc.) as per the competition instructions.
To train the model, open and run the cells in main.ipynb. This will:
- Load the datasets
- Initialize the model
- Train with early stopping
- Save the best model to
checkpoint.pt
- Plot training and validation loss
After training, the notebook evaluates the model on the test set and prints the final loss.
To visualize predictions, use view.ipynb. This notebook:
- Loads a trained model
- Displays images with predicted and (optionally) ground truth bounding boxes
The model is a simple convolutional neural network defined in CoordinateCNN
. It predicts the normalized center coordinates and width/height of the bounding box for each image.
This project is licensed under the GNU GPL v3. See LICENSE for details.
Contact:
For questions or issues, please open an issue on this repository.