CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin Code Chain s success. |
| Researcher Affiliation | Industry | Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty Salesforce Research {hungle, hailin.chen, amrita.saha}@salesforce.com |
| Pseudocode | No | The paper includes diagrams and examples of prompts (Figures 2, 3, 4, 10), but it does not contain a formally labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | 1https://github.com/Salesforce AIResearch/Code Chain |
| Open Datasets | Yes | We demonstrate the efficacy of Code Chain on challenging code generation tasks, specifically, on two major benchmarks: APPS (Hendrycks et al., 2021), and Code Contests (Li et al., 2022). |
| Dataset Splits | Yes | On APPS and Code Contests, we reported the results on the test split following the best self-revision round performance on the validation set. ... Note that on APPS, as the original benchmark does not include a specific validation split, we randomly selected samples from the original training split and reported validation results on this set. |
| Hardware Specification | No | The paper specifies the base language models used (Open AI s GPT3.5 and GPT4, Wizard Coder) and that Hugging Face-hosted model parameters and v LLM were utilized for generation. However, it does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | Yes | We applied Code Chain to both open-sourced and closed-sourced pretrained LLMs, including Open AI s GPT3.5 and GPT4 (Koubaa, 2023), and Wizard Coder (Luo et al., 2023). ... For Wizard Coder, we utilized the Hugging Face-hosted model parameters (Wolf et al., 2019) and v LLM (Kwon et al., 2023) to generate programs. ... we chose to use Star Encoder (Li et al., 2023) to embed sampled sub-modules throughout all experiments. |
| Experiment Setup | Yes | To apply Code Chain, we fixed the budget in each generation/revision round to N = 20 generation samples per problem. ... We adopted a default temperature of 0.6 to generate output tokens and a max output length of 2048 tokens. |