On the Noise Robustness of In-Context Learning for Text Generation
Authors: hongfu gao, Feipeng Zhang, Wenyu Jiang, Jun Shu, Feng Zheng, Hongxin Wei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of LPR, improving the EM score by up to 18.75 on common benchmarks with noisy annotations. |
| Researcher Affiliation | Academia | Hongfu Gao1,2 , Feipeng Zhang2, Wenyu Jiang1,3, Jun Shu4, Feng Zheng5, Hongxin Wei1 1Department of Statistics and Data Science, Southern University of Science and Technology 2School of Economics and Finance, Xi an Jiaotong University 3National Key Laboratory for Novel Software Technology, Nanjing University 4School of Mathematics and Statistics, Xi an Jiaotong University 5Department of Computer Science and Engineering, Southern University of Science and Technology |
| Pseudocode | No | The paper describes the proposed method using prose and mathematical formulas, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps formatted like code or an algorithm. |
| Open Source Code | Yes | Our code is available at https://github.com/ml-stat-Sustech/Local-Perplexity-Ranking |
| Open Datasets | Yes | We employ 6 generation datasets for the evaluations, including Open-Domain Question Answering: NQ [22], Web Q [5]; Reading Comprehension: SQu AD [46], SCIQ [56]; Code Generation: Geo Query [39], NL2Bash [27]. ... The train sets of these datasets are regarded as examples datasets and the test sets are used to evaluate the performance of ICL. |
| Dataset Splits | No | The paper explicitly mentions 'train sets' and 'test sets' in Appendix A.2, stating that 'The train sets of these datasets are regarded as examples datasets and the test sets are used to evaluate the performance of ICL', but it does not explicitly state the use or size of a validation split. |
| Hardware Specification | Yes | We run our experiments on NVIDIA L40 GPU. |
| Software Dependencies | No | The paper mentions using 'Llama-2-7B-Chat [49]', 'Mistral-7B [19]', 'OPT-6.7B [66]' as LLMs and 'bert-base-uncased sentence encoder' and 'Open ICL repository [59, 60]' for code, but it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | For hyperparameters, we set the number of neighbors k = 4 and the threshold γ = 50% by default. The details of our implementation is presented in Appendix A.2. |