Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Feat LLM on 13 different tabular datasets in low-shot regimes, showing its strong and robust performance. Our framework outperforms contemporary few-shot learning baselines across various settings. As demonstrated across numerous tabular datasets from a wide range of domains, Feat LLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as Tab LLM and STUNT.
Researcher Affiliation Collaboration 1Work done at Google as a research intern. School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 2Google Cloud AI, Sunnyvale, California, USA. Correspondence to: Jinsung Yoon <jinsungyoon@google.com>.
Pseudocode No The paper describes the method's steps and shows examples of prompts and generated code snippets, but it does not include a formal pseudocode block or an algorithm section labeled as such.
Open Source Code Yes The code is released via anonymized Git Hub link at https: //github.com/Sungwon-Han/Feat LLM.
Open Datasets Yes Our experiment utilizes 13 datasets for binary or multi-class classification tasks: (1) Adult (Asuncion & Newman, 2007); (2) Bank (Moro et al., 2014); (3) Blood (Yeh et al., 2009); (4) Car (Kadra et al., 2021); (5) Communities (Redmond, 2009); (6) Credit-g (Kadra et al., 2021); (7) Diabetes3; (8) Heart4; and (9) Myocardial (Golovenkin & Voino-Yasenetsky, 2020).
Dataset Splits Yes We employ k-fold cross-validation for optimal epoch selection.
Hardware Specification Yes One A100 GPU is used as the default, except for Tab LLM which uses four A100 GPUs for model parallelism.
Software Dependencies No The paper mentions software like 'GPT-3.5', 'PaLM 2 Text-Unicorn model', 'T0', 'Adam optimizer', and 'Python’s exec() function' but does not provide specific version numbers for any of these components or libraries, which is required for reproducibility.
Experiment Setup Yes The temperature for the LLM inference is set to 0.5 and the top-p value is set to the default value of 1 in API. We set the number of ensembles and the number of rules for extracting to 20 and 10 respectively. Details on hyper-parameter impacts are in Figure 6 and Appendix B. We use the Adam optimizer of a learning rate 0.01 for the linear model, training for 200 epochs.