LIFT: Language-Interfaced Fine-Tuning for Non-language Machine Learning Tasks

Authors: Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks.
Researcher Affiliation Academia University of Wisconsin-Madison, USA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/UW-Madison-Lee-Lab/ Language Interfaced Fine Tuning.
Open Datasets Yes For classification, we use three types of non-language data: low-dimensional synthetic datasets, real tabular datasets in Open ML [36], and vision datasets (MNIST [37], Fashion-MNIST [38] and their permuted variants [39])....We also use four real datasets: Medical Insurance (Insurance) [41], Combined Cycle Power Plant (CCPP) [42], Servo [43], and Student Performance (Student) [44].
Dataset Splits Yes For hyperparameter selection, we apply the grid search on a set of parameters values and use cross-validation on the training set (see details in Appendix C.2).
Hardware Specification Yes For experiments on GPT-J, we used p3.8xlarge and p3.2xlarge instances from AWS and RTX3090 GPUs in the local server.
Software Dependencies No The paper mentions using 'Lo RA' and the 'Open AI API' but does not specify version numbers for these or any other software dependencies in the main text.
Experiment Setup Yes We use the default cross-entropy loss for token prediction in LMs. Our generic template (without feature names and task description) for sample r is When we have x1=r.x1, x2=r.x2, ..., xp=r.xp, what should be y? | {z } question ### | {z } q/a separator y = r.y | {z } @@@ | {z } end of answer if r has p attributes. ... For hyperparameter selection, we apply the grid search on a set of parameters values and use cross-validation on the training set (see details in Appendix C.2). ... we adjust the generation randomness by increasing the decoding temperature [33, 34, 35] from 0 (deterministic mode) to 0.75 (random mode).