Skip to content

KhushiRajurkar/EXXA-Sequential-Test-Exoplanet-Transit-Detection

Repository files navigation

EXXA-Sequential-Test-Exoplanet-Transit-Detection

This repository contains my solution for the EXXA_3 task as part of the ML4Sci Google Summer of Code 2025. The goal of this project is to simulate realistic transit light curves using physical parameters and train a classifier to detect the presence of an exoplanet from these light curves.

Task Summary

  • Synthetic Data Generation:
    Transit light curves are simulated using the batman package. A range of physical parameters—such as the planet-to-star radius ratio, semi-major axis, orbital inclination, and mid-transit time—are randomly varied to produce a diverse dataset. Flat (non-transit) curves are generated by adding Gaussian noise to a constant flux. This results in a balanced dataset of transit (label 1) and non-transit (label 0) light curves.

  • Classifier Training:
    A custom 1D convolutional neural network (CNN) is used to classify the synthetic light curves. The network processes the 1D time-series data (1000 points per curve) and outputs a single logit per sample, which is converted to a probability using the sigmoid function. The model is trained using binary cross-entropy loss (BCEWithLogitsLoss) with an 80/20 train/validation split.

  • Evaluation:
    Model performance is evaluated using standard metrics including validation accuracy, ROC curve, AUC, and a confusion matrix. The best-performing model is saved within the repository, and all evaluation code is included so that the entire pipeline can be run end-to-end with minimal effort.

Model Overview

The classifier, implemented in PyTorch, is designed as a 1D CNN with the following architecture:

  • Convolutional Layers:
    Three convolutional layers with increasing channel depth (32 → 64 → 128), followed by ReLU activations and max pooling (or adaptive pooling) to extract temporal features from the light curves.

  • Fully Connected Layers:
    After flattening the feature maps, a dropout layer is applied for regularization, followed by a linear layer that outputs a single logit per sample for binary classification.

  • Loss Function:
    The network uses BCEWithLogitsLoss, with target labels converted to float tensors with shape [N, 1].

Notebook Pipeline

The end-to-end pipeline in the notebook includes the following steps:

  1. Dependencies & Setup:
    Installation of required packages (e.g., numpy, matplotlib, torch, batman-package, scikit-learn) and mounting of Google Drive (if applicable).

  2. Synthetic Data Generation:
    A function (simulate_transit_curve) generates realistic transit light curves using the batman package. The create_dataset function builds a balanced dataset by randomly simulating transit and flat light curves.

  3. Data Preparation:
    The generated data (shape [N, 1000]) is converted to PyTorch tensors and reshaped to [N, 1, 1000]. Labels are converted to float tensors and unsqueezed to shape [N, 1].

  4. Model Training:
    The CNN classifier is trained over a number of epochs using an 80/20 train/validation split. A learning rate scheduler (ReduceLROnPlateau) is used, and the best-performing model (based on validation accuracy) is saved locally.

  5. Evaluation:
    The notebook computes validation metrics, including loss and accuracy, and then evaluates the best model by generating a confusion matrix, ROC curve, and AUC score.

  6. Inference:
    A final section demonstrates how to run inference on new synthetic light curves, showing the predicted probability of transit for a given curve.

Model & Results Access

  • Model File:
    The best-performing model is saved as transit_classifier_best.pth in the repository.
  • Results:
    All evaluation outputs (confusion matrix, ROC curve, AUC) are generated within the notebook. Since the dataset is synthetic, users can regenerate the data by running the notebook.

Notes

  • The synthetic dataset is generated using batman to simulate realistic transit curves. Adjust parameters in the simulate_transit_curve function if needed.
  • All code necessary to reproduce the training and evaluation pipeline is contained in the notebook.
  • This solution is designed to run end-to-end with minimal user intervention.

Install the dependencies using:

!pip install torch torchvision numpy matplotlib batman-package scikit-learn

About

A solution for simulating realistic exoplanet transit light curves and training a classifier to detect exoplanet transits

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published