CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin Code Chain s success.
Researcher Affiliation Industry Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty Salesforce Research {hungle, hailin.chen, amrita.saha}@salesforce.com
Pseudocode No The paper includes diagrams and examples of prompts (Figures 2, 3, 4, 10), but it does not contain a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes 1https://github.com/Salesforce AIResearch/Code Chain
Open Datasets Yes We demonstrate the efficacy of Code Chain on challenging code generation tasks, specifically, on two major benchmarks: APPS (Hendrycks et al., 2021), and Code Contests (Li et al., 2022).
Dataset Splits Yes On APPS and Code Contests, we reported the results on the test split following the best self-revision round performance on the validation set. ... Note that on APPS, as the original benchmark does not include a specific validation split, we randomly selected samples from the original training split and reported validation results on this set.
Hardware Specification No The paper specifies the base language models used (Open AI s GPT3.5 and GPT4, Wizard Coder) and that Hugging Face-hosted model parameters and v LLM were utilized for generation. However, it does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies Yes We applied Code Chain to both open-sourced and closed-sourced pretrained LLMs, including Open AI s GPT3.5 and GPT4 (Koubaa, 2023), and Wizard Coder (Luo et al., 2023). ... For Wizard Coder, we utilized the Hugging Face-hosted model parameters (Wolf et al., 2019) and v LLM (Kwon et al., 2023) to generate programs. ... we chose to use Star Encoder (Li et al., 2023) to embed sampled sub-modules throughout all experiments.
Experiment Setup Yes To apply Code Chain, we fixed the budget in each generation/revision round to N = 20 generation samples per problem. ... We adopted a default temperature of 0.6 to generate output tokens and a max output length of 2048 tokens.