reproducibilityindex.ai

Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Feat LLM on 13 different tabular datasets in low-shot regimes, showing its strong and robust performance. Our framework outperforms contemporary few-shot learning baselines across various settings. As demonstrated across numerous tabular datasets from a wide range of domains, Feat LLM generates high-quality rules, signiﬁcantly (10% on average) outperforming alternatives such as Tab LLM and STUNT.
Researcher Affiliation	Collaboration	1Work done at Google as a research intern. School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 2Google Cloud AI, Sunnyvale, California, USA. Correspondence to: Jinsung Yoon <jinsungyoon@google.com>.
Pseudocode	No	The paper describes the method's steps and shows examples of prompts and generated code snippets, but it does not include a formal pseudocode block or an algorithm section labeled as such.
Open Source Code	Yes	The code is released via anonymized Git Hub link at https: //github.com/Sungwon-Han/Feat LLM.
Open Datasets	Yes	Our experiment utilizes 13 datasets for binary or multi-class classiﬁcation tasks: (1) Adult (Asuncion & Newman, 2007); (2) Bank (Moro et al., 2014); (3) Blood (Yeh et al., 2009); (4) Car (Kadra et al., 2021); (5) Communities (Redmond, 2009); (6) Credit-g (Kadra et al., 2021); (7) Diabetes3; (8) Heart4; and (9) Myocardial (Golovenkin & Voino-Yasenetsky, 2020).
Dataset Splits	Yes	We employ k-fold cross-validation for optimal epoch selection.
Hardware Specification	Yes	One A100 GPU is used as the default, except for Tab LLM which uses four A100 GPUs for model parallelism.
Software Dependencies	No	The paper mentions software like 'GPT-3.5', 'PaLM 2 Text-Unicorn model', 'T0', 'Adam optimizer', and 'Python’s exec() function' but does not provide specific version numbers for any of these components or libraries, which is required for reproducibility.
Experiment Setup	Yes	The temperature for the LLM inference is set to 0.5 and the top-p value is set to the default value of 1 in API. We set the number of ensembles and the number of rules for extracting to 20 and 10 respectively. Details on hyper-parameter impacts are in Figure 6 and Appendix B. We use the Adam optimizer of a learning rate 0.01 for the linear model, training for 200 epochs.