TabDPT: Scaling Tabular Foundation Models on Real Data

TabDPT is an open-source foundation model for tabular data based on in-context learning. It is trained on real-world data and can generalize to new tasks without additional training or hyperparameter tuning.

This repository provides the full training code to build your own TabDPT model. A lightweight inference interface is available here, which can support the evaluation of either the existing TabDPT model or any new models that are trained using this repository.

Usage

We provide basic usage tips below. The details can be found by stepping through the code.

Installation

Before running the code, make sure to install the required Python packages:

pip install -r requirements.txt

You will also need a C compiler such as gcc for building some dependencies. On Ubuntu, you can install it with:

sudo apt-get update
sudo apt-get install build-essential

Training Example

To train a fresh TabDPT model with default hyperparameters on a single GPU, use the following command:

CUDA_VISIBLE_DEVICES=0 python train.py exp_name="TabDPT"

Multi GPU Training Example

If instead you want to use Multi-GPU, do the following:

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --rdzv_endpoint=localhost:29500 train.py \
  env.gpus="[4,5,6,7]" \
  exp_name="my_multi_gpu_test"

Notes:

Adjust nproc_per_node to the number of GPUs.
If there are communication issues when using several multi gpu training runs on the same node, change the rdzv_endpoint port as it can be maxxed out.

Citation and Acknowledgements

If citing the paper, please use the following BibTeX:

@article{ma2024tabdpt,
  title={TabDPT: Scaling Tabular Foundation Models on Real Data},
  author={Ma, Junwei and Thomas, Valentin and Hosseinzadeh, Rasa and Kamkari, Hamidreza and Labach, Alex and Cresswell, Jesse C and Golestan, Keyvan and Yu, Guangwei and Caterini, Anthony L and Volkovs, Maksims},
  journal={arXiv preprint arXiv:2410.18164},
  year={2024}
}

Additionally, a huge thank you to Nafiseh Ghoroghchian for spearheading the effort of refactoring and making this codebase fit for pubilc consumption, and thank you to Roc Zhang for making the codebase compatible with safetensors.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data_splits		data_splits
figures		figures
tabdpt_datasets		tabdpt_datasets
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
eval_full.py		eval_full.py
model.py		model.py
predict.py		predict.py
requirements.txt		requirements.txt
tabdpt.py		tabdpt.py
train.py		train.py
transformer_layer.py		transformer_layer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TabDPT: Scaling Tabular Foundation Models on Real Data

Usage

Installation

Training Example

Multi GPU Training Example

Citation and Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

layer6ai-labs/TabDPT-training

Folders and files

Latest commit

History

Repository files navigation

TabDPT: Scaling Tabular Foundation Models on Real Data

Usage

Installation

Training Example

Multi GPU Training Example

Citation and Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages