Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Feat LLM on 13 different tabular datasets in low-shot regimes, showing its strong and robust performance. Our framework outperforms contemporary few-shot learning baselines across various settings. As demonstrated across numerous tabular datasets from a wide range of domains, Feat LLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as Tab LLM and STUNT. |
| Researcher Affiliation | Collaboration | 1Work done at Google as a research intern. School of Computing, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 2Google Cloud AI, Sunnyvale, California, USA. Correspondence to: Jinsung Yoon <jinsungyoon@google.com>. |
| Pseudocode | No | The paper describes the method's steps and shows examples of prompts and generated code snippets, but it does not include a formal pseudocode block or an algorithm section labeled as such. |
| Open Source Code | Yes | The code is released via anonymized Git Hub link at https: //github.com/Sungwon-Han/Feat LLM. |
| Open Datasets | Yes | Our experiment utilizes 13 datasets for binary or multi-class classification tasks: (1) Adult (Asuncion & Newman, 2007); (2) Bank (Moro et al., 2014); (3) Blood (Yeh et al., 2009); (4) Car (Kadra et al., 2021); (5) Communities (Redmond, 2009); (6) Credit-g (Kadra et al., 2021); (7) Diabetes3; (8) Heart4; and (9) Myocardial (Golovenkin & Voino-Yasenetsky, 2020). |
| Dataset Splits | Yes | We employ k-fold cross-validation for optimal epoch selection. |
| Hardware Specification | Yes | One A100 GPU is used as the default, except for Tab LLM which uses four A100 GPUs for model parallelism. |
| Software Dependencies | No | The paper mentions software like 'GPT-3.5', 'PaLM 2 Text-Unicorn model', 'T0', 'Adam optimizer', and 'Python’s exec() function' but does not provide specific version numbers for any of these components or libraries, which is required for reproducibility. |
| Experiment Setup | Yes | The temperature for the LLM inference is set to 0.5 and the top-p value is set to the default value of 1 in API. We set the number of ensembles and the number of rules for extracting to 20 and 10 respectively. Details on hyper-parameter impacts are in Figure 6 and Appendix B. We use the Adam optimizer of a learning rate 0.01 for the linear model, training for 200 epochs. |