reproducibilityindex.ai

Supervised Knowledge Makes Large Language Models Better In-context Learners

Authors: Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs. We conduct experiments on both zero-shot and few-shot settings of natural language understanding and question answering (QA). Empirical results show that our method significantly outperforms LLMs and SLMs with both zero-shot and few-shot settings on 9 distinct tasks using the OOD setting we consider.
Researcher Affiliation	Collaboration	Linyi Yang1,2 , Shuibai Zhang1 , Zhuohao Yu3 , Guangsheng Bao1, Yidong Wang3, Jindong Wang4, Ruochen Xu4, Wei Ye3, Xing Xie4, Weizhu Chen4, Yue Zhang1,2 1School of Engineering, Westlake University, 2Westlake Institute for Advanced Study 3Peking University, 4Microsoft
Pseudocode	Yes	Algorithm 1 Super Context for Natural Language Understanding
Open Source Code	Yes	1The code and data are released at: https://github.com/Yang Linyi/Supervised-Knowledge-Makes-Large Language-Models-Better-In-context-Learners
Open Datasets	Yes	Super Context is validated on a comprehensive OOD benchmarks GLUE-X (Yang et al., 2022), and a QA dataset, SQu AD 2.0 (Rajpurkar et al., 2018).
Dataset Splits	Yes	The in-context examples are extracted from the training set and LLMs are evaluated on the validation set.
Hardware Specification	No	The paper mentions the use of specific models like Llama 2, Chat GPT, ELECTRA-large, and RoBERTa-large, but it does not provide details about the hardware (e.g., specific GPU models, CPU types, or cloud computing resources) used for running the experiments.
Software Dependencies	No	The paper mentions specific model APIs (gpt-3.5-turbo, text-davinci-003, gpt-3.5-turbo-16k) and setting the temperature, but it does not list general software dependencies like programming languages, libraries, or frameworks with specific version numbers (e.g., Python 3.x, PyTorch x.x).
Experiment Setup	No	The paper describes prompt designs and few-shot settings (e.g., 16-shot ICL), but it does not specify concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations for their models.