CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Authors: Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven Chu Hong Hoi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark. Our comprehensive experiments show that our models can achieve SOTA performance on the challenging APPS benchmark [Hendrycks et al., 2021].
Researcher Affiliation Industry Hung Le , Yue Wang , Akhilesh Deepak Gotmare, Silvio Savarese, Steven C.H. Hoi Salesforce Research
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Salesforce Research https://github.com/salesforce/Code RL
Open Datasets Yes We choose the challenging APPS program synthesis benchmark [Hendrycks et al., 2021], as it has large coding problems of varying difficulties collected from multiple coding websites. Finally, we test the zero-shot transfer ability of Code RL on another smaller and simpler program synthesis benchmark MBPP [Austin et al., 2021]. We enlarge the Python pretraining dataset using the recently released large-scale Github Code dataset5. https://huggingface.co/datasets/lvwerra/github-code
Dataset Splits No The paper mentions 'training and test splits' for the APPS benchmark but does not explicitly state details of a validation split or how it was used.
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See the configurations in Appendix ??. For training, we used 16 NVIDIA A100 80GB GPUs. For inference, we use 4 NVIDIA V100 32GB GPUs. (from Appendix D.1 Experimental Details)
Software Dependencies No The paper mentions various software components and models (e.g., Code T5, GPT models, Python) but does not provide specific version numbers for them.
Experiment Setup Yes Finetuning Setup...we applied imitation learning to first warm-start a pretrained LM model with Lce only for up to 10 epochs...After training the critic, we then apply both Lce and Lrl with equal weights to finetune the actor network. We use nucleus sampling with a batch size of N = 200.