EXXA-Sequential-Test-Exoplanet-Transit-Detection

This repository contains my solution for the EXXA_3 task as part of the ML4Sci Google Summer of Code 2025. The goal of this project is to simulate realistic transit light curves using physical parameters and train a classifier to detect the presence of an exoplanet from these light curves.

Task Summary

Synthetic Data Generation:
Transit light curves are simulated using the batman package. A range of physical parameters—such as the planet-to-star radius ratio, semi-major axis, orbital inclination, and mid-transit time—are randomly varied to produce a diverse dataset. Flat (non-transit) curves are generated by adding Gaussian noise to a constant flux. This results in a balanced dataset of transit (label 1) and non-transit (label 0) light curves.
Classifier Training:
A custom 1D convolutional neural network (CNN) is used to classify the synthetic light curves. The network processes the 1D time-series data (1000 points per curve) and outputs a single logit per sample, which is converted to a probability using the sigmoid function. The model is trained using binary cross-entropy loss (BCEWithLogitsLoss) with an 80/20 train/validation split.
Evaluation:
Model performance is evaluated using standard metrics including validation accuracy, ROC curve, AUC, and a confusion matrix. The best-performing model is saved within the repository, and all evaluation code is included so that the entire pipeline can be run end-to-end with minimal effort.

Model Overview

The classifier, implemented in PyTorch, is designed as a 1D CNN with the following architecture:

Convolutional Layers:
Three convolutional layers with increasing channel depth (32 → 64 → 128), followed by ReLU activations and max pooling (or adaptive pooling) to extract temporal features from the light curves.
Fully Connected Layers:
After flattening the feature maps, a dropout layer is applied for regularization, followed by a linear layer that outputs a single logit per sample for binary classification.
Loss Function:
The network uses BCEWithLogitsLoss, with target labels converted to float tensors with shape [N, 1].

Notebook Pipeline

The end-to-end pipeline in the notebook includes the following steps:

Dependencies & Setup:
Installation of required packages (e.g., numpy, matplotlib, torch, batman-package, scikit-learn) and mounting of Google Drive (if applicable).
Synthetic Data Generation:
A function (simulate_transit_curve) generates realistic transit light curves using the batman package. The create_dataset function builds a balanced dataset by randomly simulating transit and flat light curves.
Data Preparation:
The generated data (shape [N, 1000]) is converted to PyTorch tensors and reshaped to [N, 1, 1000]. Labels are converted to float tensors and unsqueezed to shape [N, 1].
Model Training:
The CNN classifier is trained over a number of epochs using an 80/20 train/validation split. A learning rate scheduler (ReduceLROnPlateau) is used, and the best-performing model (based on validation accuracy) is saved locally.
Evaluation:
The notebook computes validation metrics, including loss and accuracy, and then evaluates the best model by generating a confusion matrix, ROC curve, and AUC score.
Inference:
A final section demonstrates how to run inference on new synthetic light curves, showing the predicted probability of transit for a given curve.

Model & Results Access

Model File:
The best-performing model is saved as transit_classifier_best.pth in the repository.
Results:
All evaluation outputs (confusion matrix, ROC curve, AUC) are generated within the notebook. Since the dataset is synthetic, users can regenerate the data by running the notebook.

Notes

The synthetic dataset is generated using batman to simulate realistic transit curves. Adjust parameters in the simulate_transit_curve function if needed.
All code necessary to reproduce the training and evaluation pipeline is contained in the notebook.
This solution is designed to run end-to-end with minimal user intervention.

Install the dependencies using:

!pip install torch torchvision numpy matplotlib batman-package scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
EXXA_Sequential_Test_Exoplanet_Transit_Detection_Using_Synthetic_Light_Curves.ipynb		EXXA_Sequential_Test_Exoplanet_Transit_Detection_Using_Synthetic_Light_Curves.ipynb
LICENSE		LICENSE
README.md		README.md
transit_classifier_best.pth		transit_classifier_best.pth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EXXA-Sequential-Test-Exoplanet-Transit-Detection

Task Summary

Model Overview

Notebook Pipeline

Model & Results Access

Notes

About

Uh oh!

Releases

Packages

Languages

License

KhushiRajurkar/EXXA-Sequential-Test-Exoplanet-Transit-Detection

Folders and files

Latest commit

History

Repository files navigation

EXXA-Sequential-Test-Exoplanet-Transit-Detection

Task Summary

Model Overview

Notebook Pipeline

Model & Results Access

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages