reproducibilityindex.ai

On the Noise Robustness of In-Context Learning for Text Generation

Authors: hongfu gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations.
Researcher Affiliation	Academia	Hongfu Gao1,2 , Feipeng Zhang2, Wenyu Jiang1,3, Jun Shu4, Feng Zheng5, Hongxin Wei1 1Department of Statistics and Data Science, Southern University of Science and Technology 2School of Economics and Finance, Xi an Jiaotong University 3National Key Laboratory for Novel Software Technology, Nanjing University 4School of Mathematics and Statistics, Xi an Jiaotong University 5Department of Computer Science and Engineering, Southern University of Science and Technology
Pseudocode	No	The paper describes the proposed method using prose and mathematical formulas, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps formatted like code or an algorithm.
Open Source Code	Yes	Our code is available at https://github.com/ml-stat-Sustech/Local-Perplexity-Ranking
Open Datasets	Yes	We employ 6 generation datasets for the evaluations, including Open-Domain Question Answering: NQ [22], Web Q [5]; Reading Comprehension: SQu AD [46], SCIQ [56]; Code Generation: Geo Query [39], NL2Bash [27]. ... The train sets of these datasets are regarded as examples datasets and the test sets are used to evaluate the performance of ICL.
Dataset Splits	No	The paper explicitly mentions 'train sets' and 'test sets' in Appendix A.2, stating that 'The train sets of these datasets are regarded as examples datasets and the test sets are used to evaluate the performance of ICL', but it does not explicitly state the use or size of a validation split.
Hardware Specification	Yes	We run our experiments on NVIDIA L40 GPU.
Software Dependencies	No	The paper mentions using 'Llama-2-7B-Chat [49]', 'Mistral-7B [19]', 'OPT-6.7B [66]' as LLMs and 'bert-base-uncased sentence encoder' and 'Open ICL repository [59, 60]' for code, but it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	For hyperparameters, we set the number of neighbors k = 4 and the threshold γ = 50% by default. The details of our implementation is presented in Appendix A.2.