reproducibilityindex.ai

STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

Authors: Jaehyun Nam, Jihoon Tack, Kyungmin Lee, Hankook Lee, Jinwoo Shin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate that our simple framework brings significant performance gain under various tabular few-shot learning benchmarks, compared to prior semi- and self-supervised baselines. In this section, we validate the effectiveness of our method on few-shot tabular learning scenarios under various tabular datasets from the Open ML-CC18 benchmark (Bischl et al., 2021). Our results exhibit that STUNT consistently and significantly outperforms other methods, including unsupervised, semi- and self-supervised methods (Section 4.1). We further demonstrate that our method is even effective for few-shot multi-task learning (Section 4.2). Finally, we perform an ablation study to verify the effect of the proposed pseudo-validation scheme of our approach (Section 4.3).
Researcher Affiliation	Collaboration	Jaehyun Nam1 Jihoon Tack1 Kyungmin Lee1 Hankook Lee2 Jinwoo Shin1 1Korea Advanced Institute of Science and Technology (KAIST) 2LG AI Research
Pseudocode	Yes	Algorithm 1 STUNT: Self-generated Tasks from UNlabeled Tables
Open Source Code	Yes	Code is available at https://github.com/jaehyun513/STUNT. We provide code for reproduction in the supplementary material and describe the implementation details in Appendix B.
Open Datasets	Yes	We verify the effectiveness of STUNT through extensive evaluations on various datasets in the Open ML-CC18 benchmark (Vanschoren et al., 2014; Bischl et al., 2021). We select 8 datasets from the Open ML-CC18 benchmark (Bischl et al., 2021; Asuncion & Newman, 2007).
Dataset Splits	Yes	Common setup. For all the datasets, 80% of the data is used for training (unlabeled except for few-shot labeled samples) and 20% for testing, except for the income dataset, since split training and test data are provided. For STUNT, we use 20% of training data for pseudo-validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., specific GPU or CPU models, memory sizes, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software components and libraries like Cat Boost, Adam optimizer, k-nearest neighbors, but does not provide specific version numbers for these or for underlying programming languages or frameworks.
Experiment Setup	Yes	Common setup. For all the datasets, 80% of the data is used for training (unlabeled except for few-shot labeled samples) and 20% for testing, except for the income dataset, since split training and test data are provided. For STUNT, we use 20% of training data for pseudo-validation. We one-hot encode categorical features following the preprocessing of Sub Tab (Ucar et al., 2021) then apply normalization by subtracting the mean and dividing by the standard deviation for the income dataset and min-max scaling for other datasets, respectively. All baselines and STUNT are trained for 10K steps, while we follow the original training setting for CACTUs (Hsu et al., 2018). For all methods, we train a 2-layer multi-layer perceptron (MLP) with a hidden dimension of 1024. We provide additional information in the Appendix A.