Drift-Resilient TabPFN is an evolution of TabPFN, specifically designed to handle temporal distribution shifts in tabular data. Using In-Context Learning with a Prior-Data Fitted Network, it learns to recognize and adapt to changes in data distributions over time.
Pre-trained on millions of synthetic datasets generated by evolving structural causal models (SCMs), this framework effectively predicts in scenarios where data distributions are non-stationary.
The code is released under the Prior Labs License.
Simply run pip install .
in the root directory of this repository or install it as a package from the repository.
To explore how to use Drift-Resilient TabPFN and to reproduce the results presented in our paper, we provide a comprehensive demonstration in the following Google Colab notebook:
👉 Drift-Resilient-TabPFN_Demo.ipynb
This notebook walks through:
- Setting up the environment
- Running inference with the Drift-Resilient TabPFN model
- Exploring the synthetic evaluation datasets
- Plotting decision boundaries
- Evaluating the performance of our model alongside baselines under temporal distribution shifts
The notebook is also available directly in the GitHub repository, where it can be downloaded and run locally if preferred.
For more detailed information, please refer to our NeurIPS 2024 paper:
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data
If you use our work in your research, please cite us:
@inproceedings{
helli2024driftresilient,
title={Drift-Resilient Tab{PFN}: In-Context Learning Temporal Distribution Shifts on Tabular Data},
author={Kai Helli and David Schnurr and Noah Hollmann and Samuel M{\"u}ller and Frank Hutter},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=p3tSEFMwpG}
}
Dataset name | Original source / citation | License |
---|---|---|
Rotated Two Moons | Nasery et al. (2021); Bai et al. (2023) — derivative of 2-Moons | Generated with scikit-learn (BSD-3-Clause) |
Rotating Hyperplane | Montiel et al. (2018) — scikit-multiflow HyperplaneGenerator | BSD-3-Clause |
RandomRBF Drift | Montiel et al. (2018) — scikit-multiflow RandomRBFGeneratorDrift | BSD-3-Clause |
Free Light Chain Mortality | Dispenzieri et al. (2012); Kyle et al. (2006); Part of R datasets | GPL-3 |
Electricity | Harries (1999); Gama et al. (2004) — OpenML #151 | Public Domain |
Absenteeism at Work | Martiniano & Ferreira (2018) — UCI ML Repo, DOI 10.24432/C5X882 | CC BY 4.0 |
Heart Disease Dataset | Janosi et al. (1988) — UCI ML Repo, DOI 10.24432/C52P4X | CC BY 4.0 |
Parking Birmingham | Stolfi (2019) — UCI ML Repo, DOI 10.24432/C51K5Z | CC BY 4.0 |
Ames Housing Prices | De Cock (2011) — OpenML #41211 | Public Domain |
Folktables US Census | Ding et al. (2021) | US Gov. Work (downloaded, not provided) |
Chess | Žliobaitė (2011) | - |
Airlines Delay | Bifet & Ikonomovska (2009) — OpenML #1169 | Public Domain |
Istanbul Stock Exchange | Akbilgic (2013) — UCI ML Repo, DOI 10.24432/C54P4J | CC BY 4.0 |
Indian Liver Patient | Ramana & Venkateswarlu (2012) — UCI ML Repo, DOI 10.24432/C5D02C | CC BY 4.0 |
Diabetes 130-US Hospitals | Clore et al. (2014) — UCI ML Repo, DOI 10.24432/C5230J | CC BY 4.0 |
Diabetes Prediction through Questionaire | Islam et al. (2020) — UCI ML Repo, DOI 10.24432/C5VG8H | CC BY 4.0 |
Pima Indians Diabetes | Smith et al. (1988) — OpenML #37 | Public Domain |
São Paulo Urban Traffic Behavior | Ferreira et al. (2018) — UCI ML Repo, DOI 10.24432/C5902F | CC BY 4.0 |
Room Occupancy Detection | Candanedo (2016) — UCI ML Repo, DOI 10.24432/C5X01N | CC BY 4.0 |
Every dataset here has been shared after a careful licence check, and we believe public redistribution is allowed. If you hold rights to any data in this repo and think it shouldn’t be hosted, open a GitHub issue with the details. We’ll review it quickly, remove the dataset, and wipe it from the commit history.