reproducibilityindex.ai

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

Authors: Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive evaluation of 15 LLMs shows that FIM pretraining not only enhances FIM proficiency but also improves Left-to-Right (L2R) inference using LLMs.
Researcher Affiliation	Collaboration	1Department of EECS, University of California at Berkeley, Berkeley, California, USA 2AI at Meta, USA.
Pseudocode	No	The paper includes code examples in figures but does not contain pseudocode or explicitly labeled algorithm blocks.
Open Source Code	Yes	The evaluation toolkit and dataset are available at https://github. com/gonglinyuan/safim
Open Datasets	Yes	The evaluation toolkit and dataset are available at https://github. com/gonglinyuan/safim
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits for reproducibility within its own benchmark setup. The entire SAFIM dataset is used for evaluation (testing) purposes for the LLMs.
Hardware Specification	No	The paper mentions using the Open AI API and Huggingface transformers library for generation, but does not specify the hardware used for these operations or for their own experiments beyond general mentions of "computational resources".
Software Dependencies	No	The paper mentions using the Open AI API and the Huggingface transformers library, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For the remaining models, generation is conducted via the Huggingface transformers library, following established practices in Fried et al. (2023), where we use top-p random sampling with p = 0.95 and a temperature of 0.2.