Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Authors: Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive evaluation of 15 LLMs shows that FIM pretraining not only enhances FIM proficiency but also improves Left-to-Right (L2R) inference using LLMs. |
| Researcher Affiliation | Collaboration | 1Department of EECS, University of California at Berkeley, Berkeley, California, USA 2AI at Meta, USA. |
| Pseudocode | No | The paper includes code examples in figures but does not contain pseudocode or explicitly labeled algorithm blocks. |
| Open Source Code | Yes | The evaluation toolkit and dataset are available at https://github. com/gonglinyuan/safim |
| Open Datasets | Yes | The evaluation toolkit and dataset are available at https://github. com/gonglinyuan/safim |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits for reproducibility within its own benchmark setup. The entire SAFIM dataset is used for evaluation (testing) purposes for the LLMs. |
| Hardware Specification | No | The paper mentions using the Open AI API and Huggingface transformers library for generation, but does not specify the hardware used for these operations or for their own experiments beyond general mentions of "computational resources". |
| Software Dependencies | No | The paper mentions using the Open AI API and the Huggingface transformers library, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For the remaining models, generation is conducted via the Huggingface transformers library, following established practices in Fried et al. (2023), where we use top-p random sampling with p = 0.95 and a temperature of 0.2. |