STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables
Authors: Jaehyun Nam, Jihoon Tack, Kyungmin Lee, Hankook Lee, Jinwoo Shin
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that our simple framework brings significant performance gain under various tabular few-shot learning benchmarks, compared to prior semi- and self-supervised baselines. In this section, we validate the effectiveness of our method on few-shot tabular learning scenarios under various tabular datasets from the Open ML-CC18 benchmark (Bischl et al., 2021). Our results exhibit that STUNT consistently and significantly outperforms other methods, including unsupervised, semi- and self-supervised methods (Section 4.1). We further demonstrate that our method is even effective for few-shot multi-task learning (Section 4.2). Finally, we perform an ablation study to verify the effect of the proposed pseudo-validation scheme of our approach (Section 4.3). |
| Researcher Affiliation | Collaboration | Jaehyun Nam1 Jihoon Tack1 Kyungmin Lee1 Hankook Lee2 Jinwoo Shin1 1Korea Advanced Institute of Science and Technology (KAIST) 2LG AI Research |
| Pseudocode | Yes | Algorithm 1 STUNT: Self-generated Tasks from UNlabeled Tables |
| Open Source Code | Yes | Code is available at https://github.com/jaehyun513/STUNT. We provide code for reproduction in the supplementary material and describe the implementation details in Appendix B. |
| Open Datasets | Yes | We verify the effectiveness of STUNT through extensive evaluations on various datasets in the Open ML-CC18 benchmark (Vanschoren et al., 2014; Bischl et al., 2021). We select 8 datasets from the Open ML-CC18 benchmark (Bischl et al., 2021; Asuncion & Newman, 2007). |
| Dataset Splits | Yes | Common setup. For all the datasets, 80% of the data is used for training (unlabeled except for few-shot labeled samples) and 20% for testing, except for the income dataset, since split training and test data are provided. For STUNT, we use 20% of training data for pseudo-validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., specific GPU or CPU models, memory sizes, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and libraries like Cat Boost, Adam optimizer, k-nearest neighbors, but does not provide specific version numbers for these or for underlying programming languages or frameworks. |
| Experiment Setup | Yes | Common setup. For all the datasets, 80% of the data is used for training (unlabeled except for few-shot labeled samples) and 20% for testing, except for the income dataset, since split training and test data are provided. For STUNT, we use 20% of training data for pseudo-validation. We one-hot encode categorical features following the preprocessing of Sub Tab (Ucar et al., 2021) then apply normalization by subtracting the mean and dividing by the standard deviation for the income dataset and min-max scaling for other datasets, respectively. All baselines and STUNT are trained for 10K steps, while we follow the original training setting for CACTUs (Hsu et al., 2018). For all methods, we train a 2-layer multi-layer perceptron (MLP) with a hidden dimension of 1024. We provide additional information in the Appendix A. |