LeDex: Training LLMs to Better Self-Debug and Explain Code

Authors: Nan Jiang, Xiaopeng Li, Shiqi Wang, Qiang Zhou, Soneya Hossain, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15.92% and pass@10 by 9.30% over four benchmarks. RL training brings additional up to 3.54% improvement on pass@1 and 2.55% improvement on pass@10.
Researcher Affiliation Collaboration 1Purdue University 2AWS AI Labs 3University of Virginia
Pseudocode No The paper refers to the PPO algorithm in Appendix A.3 and provides mathematical formulations, but no explicit pseudocode or algorithm blocks are presented.
Open Source Code No The paper does not contain an explicit statement about releasing the code for the described methodology or a direct link to a code repository.
Open Datasets Yes We use MBPP [3] (only use the 374 problems in the training set during training), APPS [4] (only use the 5,000 problems in the training set) and Code Contests [2] as our base training datasets, which contain programming problems and solutions collected from various platforms.
Dataset Splits No For supervised fine-tuning, we fine-tune three LLMs (Star Coder-15B, Code Llama-7B, and Code Llama-13B) using the correct initial solutions and correct refinements collected from the MBPP training set, APPS training set, and Code Contests.
Hardware Specification Yes Both the supervised fine-tuning and reinforcement learning are conducted on 8 NVIDIA A100 GPUs, each with 40GB of memory.
Software Dependencies No The paper mentions software components such as 'Adam W [24]', the 'TRL [25] library', and 'Ro BERTa(e)' for calculating sentiment similarity, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes For supervised fine-tuning, we fine-tune three LLMs (Star Coder-15B, Code Llama-7B, and Code Llama-13B) using the correct initial solutions and correct refinements collected from the MBPP training set, APPS training set, and Code Contests. The model is fine-tuned for two epochs, using a batch size of 128. The optimizer is Adam W [24] with learning rate set to 2e 5. The learning rate is adjusted using a warmup of 500 steps and then decayed following a cosine scheduler.