Skip to content

Drift-Resilient TabPFN is a method using In-Context Learning via a Prior-Data Fitted Network, to address temporal distribution shifts in tabular data, outperforming existing methods in terms of performance and calibration.

License

Notifications You must be signed in to change notification settings

automl/Drift-Resilient_TabPFN

Repository files navigation

Drift-Resilient TabPFN

Drift-Resilient TabPFN is an evolution of TabPFN, specifically designed to handle temporal distribution shifts in tabular data. Using In-Context Learning with a Prior-Data Fitted Network, it learns to recognize and adapt to changes in data distributions over time.

Pre-trained on millions of synthetic datasets generated by evolving structural causal models (SCMs), this framework effectively predicts in scenarios where data distributions are non-stationary.

The code is released under the Prior Labs License.

Installation

Simply run pip install . in the root directory of this repository or install it as a package from the repository.

Example Usage & Reproducibility

To explore how to use Drift-Resilient TabPFN and to reproduce the results presented in our paper, we provide a comprehensive demonstration in the following Google Colab notebook:

👉 Drift-Resilient-TabPFN_Demo.ipynb

This notebook walks through:

  • Setting up the environment
  • Running inference with the Drift-Resilient TabPFN model
  • Exploring the synthetic evaluation datasets
  • Plotting decision boundaries
  • Evaluating the performance of our model alongside baselines under temporal distribution shifts

The notebook is also available directly in the GitHub repository, where it can be downloaded and run locally if preferred.

Our Paper

For more detailed information, please refer to our NeurIPS 2024 paper:

Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

If you use our work in your research, please cite us:

@inproceedings{
  helli2024driftresilient,
  title={Drift-Resilient Tab{PFN}: In-Context Learning Temporal Distribution Shifts on Tabular Data},
  author={Kai Helli and David Schnurr and Noah Hollmann and Samuel M{\"u}ller and Frank Hutter},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=p3tSEFMwpG}
}

Datasets Sources & Licensing

Dataset name Original source / citation License
Rotated Two Moons Nasery et al. (2021); Bai et al. (2023) — derivative of 2-Moons Generated with scikit-learn (BSD-3-Clause)
Rotating Hyperplane Montiel et al. (2018) — scikit-multiflow HyperplaneGenerator BSD-3-Clause
RandomRBF Drift Montiel et al. (2018) — scikit-multiflow RandomRBFGeneratorDrift BSD-3-Clause
Free Light Chain Mortality Dispenzieri et al. (2012); Kyle et al. (2006); Part of R datasets GPL-3
Electricity Harries (1999); Gama et al. (2004) — OpenML #151 Public Domain
Absenteeism at Work Martiniano & Ferreira (2018) — UCI ML Repo, DOI 10.24432/C5X882 CC BY 4.0
Heart Disease Dataset Janosi et al. (1988) — UCI ML Repo, DOI 10.24432/C52P4X CC BY 4.0
Parking Birmingham Stolfi (2019) — UCI ML Repo, DOI 10.24432/C51K5Z CC BY 4.0
Ames Housing Prices De Cock (2011) — OpenML #41211 Public Domain
Folktables US Census Ding et al. (2021) US Gov. Work (downloaded, not provided)
Chess Žliobaitė (2011) -
Airlines Delay Bifet & Ikonomovska (2009) — OpenML #1169 Public Domain
Istanbul Stock Exchange Akbilgic (2013) — UCI ML Repo, DOI 10.24432/C54P4J CC BY 4.0
Indian Liver Patient Ramana & Venkateswarlu (2012) — UCI ML Repo, DOI 10.24432/C5D02C CC BY 4.0
Diabetes 130-US Hospitals Clore et al. (2014) — UCI ML Repo, DOI 10.24432/C5230J CC BY 4.0
Diabetes Prediction through Questionaire Islam et al. (2020) — UCI ML Repo, DOI 10.24432/C5VG8H CC BY 4.0
Pima Indians Diabetes Smith et al. (1988) — OpenML #37 Public Domain
São Paulo Urban Traffic Behavior Ferreira et al. (2018) — UCI ML Repo, DOI 10.24432/C5902F CC BY 4.0
Room Occupancy Detection Candanedo (2016) — UCI ML Repo, DOI 10.24432/C5X01N CC BY 4.0

Every dataset here has been shared after a careful licence check, and we believe public redistribution is allowed. If you hold rights to any data in this repo and think it shouldn’t be hosted, open a GitHub issue with the details. We’ll review it quickly, remove the dataset, and wipe it from the commit history.

About

Drift-Resilient TabPFN is a method using In-Context Learning via a Prior-Data Fitted Network, to address temporal distribution shifts in tabular data, outperforming existing methods in terms of performance and calibration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •