LLM-Assisted Code Cleaning For Training Accurate Code Generators
Authors: Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CODELLAMA-7B on our transformed modularized programs improves the performance by up to 30% compared to fine-tuning on the original dataset. |
| Researcher Affiliation | Academia | Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen & Ion Stoica University of California, Berkeley {naman_jain,tianjunz,weichiang,jegonzal,ksen,istoica}@berkeley.edu |
| Pseudocode | No | The paper describes the steps for data cleaning and transformation in text, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to using CODELLAMA-7B model checkpoint from Hugging Face and VLLM for inference, but does not state that the authors are releasing their own code for the described methodology. |
| Open Datasets | Yes | We use two standard algorithmic code generation benchmarks, APPS and CODE-CONTESTS. The benchmarks provide a collection of problem statements described in natural language and corresponding test cases. |
| Dataset Splits | No | Table 1: Details about the number of problems, the median number of test cases per problem, and the number of solutions in the APPS and CODE-CONTESTS datasets. (Table only lists 'train' and 'test' counts for problems and solutions, no validation split) |
| Hardware Specification | Yes | We train the models for two epochs on the APPS dataset and one epoch on the CODE-CONTESTS dataset using a 5e 5 learning rate and an effective batch size of 256 on 4 A6000 GPUs. |
| Software Dependencies | No | The paper mentions using CODELLAMA-7B model, Hugging Face, VLLM, DeepSpeed, and gaoya, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train the models for two epochs on the APPS dataset and one epoch on the CODE-CONTESTS dataset using a 5e 5 learning rate and an effective batch size of 256 on 4 A6000 GPUs. |