Tabular Regression
Safetensors
File size: 2,775 Bytes
2a1534a
199e597
 
2a1534a
 
199e597
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
---
license: apache-2.0
pipeline_tag: tabular-regression
---

# TabPFNMix Regressor

TabPFNMix regressor is a tabular foundation model that is pre-trained on purely synthetic datasets sampled from a mix of random regressors. 

## Architecture

TabPFNMix is based on a 12-layer encoder-decoder Transformer of 37 M parameters. We use a pre-training strategy incorporating in-context learning, similar to that used by TabPFN and TabForestPFN.

## Usage

To use TabPFNMix regressor, install AutoGluon by running:

```sh
pip install autogluon
```

A minimal example showing how to perform fine-tuning and inference using TabPFNMix regressor

```python
import pandas as pd

from autogluon.tabular import TabularPredictor


if __name__ == '__main__':  
    train_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
    subsample_size = 5000
    if subsample_size is not None and subsample_size < len(train_data):
        train_data = train_data.sample(n=subsample_size, random_state=0)
    test_data = pd.read_csv('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')

    tabpfnmix_default = {
        "model_path_classifier": "autogluon/tabpfn-mix-1.0-classifier",
        "model_path_regressor": "autogluon/tabpfn-mix-1.0-regressor",
        "n_ensembles": 1,
        "max_epochs": 30,
    }

    hyperparameters = {
        "TABPFNMIX": [
            tabpfnmix_default,
        ],
    }

    label = "age"
    problem_type = "regression"

    predictor = TabularPredictor(
        label=label,
        problem_type=problem_type,
    )
    predictor = predictor.fit(
        train_data=train_data,
        hyperparameters=hyperparameters,
        verbosity=3,
    )

    predictor.leaderboard(test_data, display=True)
```

## Citation

If you find TabPFNMix useful for your research, please consider citing the associated papers:

```
@article{erickson2020autogluon,
  title={Autogluon-tabular: Robust and accurate automl for structured data},
  author={Erickson, Nick and Mueller, Jonas and Shirkov, Alexander and Zhang, Hang and Larroy, Pedro and Li, Mu and Smola, Alexander},
  journal={arXiv preprint arXiv:2003.06505},
  year={2020}
}

@article{hollmann2022tabpfn,
  title={Tabpfn: A transformer that solves small tabular classification problems in a second},
  author={Hollmann, Noah and M{\"u}ller, Samuel and Eggensperger, Katharina and Hutter, Frank},
  journal={arXiv preprint arXiv:2207.01848},
  year={2022}
}

@article{breejen2024context,
  title={Why In-Context Learning Transformers are Tabular Data Classifiers},
  author={Breejen, Felix den and Bae, Sangmin and Cha, Stephen and Yun, Se-Young},
  journal={arXiv preprint arXiv:2405.13396},
  year={2024}
}
```

## License

This project is licensed under the Apache-2.0 License.