Supervised Knowledge Makes Large Language Models Better In-context Learners
Authors: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs. We conduct experiments on both zero-shot and few-shot settings of natural language understanding and question answering (QA). Empirical results show that our method significantly outperforms LLMs and SLMs with both zero-shot and few-shot settings on 9 distinct tasks using the OOD setting we consider. |
| Researcher Affiliation | Collaboration | Linyi Yang1,2 , Shuibai Zhang1 , Zhuohao Yu3 , Guangsheng Bao1, Yidong Wang3, Jindong Wang4, Ruochen Xu4, Wei Ye3, Xing Xie4, Weizhu Chen4, Yue Zhang1,2 1School of Engineering, Westlake University, 2Westlake Institute for Advanced Study 3Peking University, 4Microsoft |
| Pseudocode | Yes | Algorithm 1 Super Context for Natural Language Understanding |
| Open Source Code | Yes | 1The code and data are released at: https://github.com/Yang Linyi/Supervised-Knowledge-Makes-Large Language-Models-Better-In-context-Learners |
| Open Datasets | Yes | Super Context is validated on a comprehensive OOD benchmarks GLUE-X (Yang et al., 2022), and a QA dataset, SQu AD 2.0 (Rajpurkar et al., 2018). |
| Dataset Splits | Yes | The in-context examples are extracted from the training set and LLMs are evaluated on the validation set. |
| Hardware Specification | No | The paper mentions the use of specific models like Llama 2, Chat GPT, ELECTRA-large, and RoBERTa-large, but it does not provide details about the hardware (e.g., specific GPU models, CPU types, or cloud computing resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions specific model APIs (gpt-3.5-turbo, text-davinci-003, gpt-3.5-turbo-16k) and setting the temperature, but it does not list general software dependencies like programming languages, libraries, or frameworks with specific version numbers (e.g., Python 3.x, PyTorch x.x). |
| Experiment Setup | No | The paper describes prompt designs and few-shot settings (e.g., 16-shot ICL), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations for their models. |