reproducibilityindex.ai

CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin Code Chain s success.
Researcher Affiliation	Industry	Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty Salesforce Research {hungle, hailin.chen, amrita.saha}@salesforce.com
Pseudocode	No	The paper includes diagrams and examples of prompts (Figures 2, 3, 4, 10), but it does not contain a formally labeled "Pseudocode" or "Algorithm" block.
Open Source Code	Yes	1https://github.com/Salesforce AIResearch/Code Chain
Open Datasets	Yes	We demonstrate the efficacy of Code Chain on challenging code generation tasks, specifically, on two major benchmarks: APPS (Hendrycks et al., 2021), and Code Contests (Li et al., 2022).
Dataset Splits	Yes	On APPS and Code Contests, we reported the results on the test split following the best self-revision round performance on the validation set. ... Note that on APPS, as the original benchmark does not include a specific validation split, we randomly selected samples from the original training split and reported validation results on this set.
Hardware Specification	No	The paper specifies the base language models used (Open AI s GPT3.5 and GPT4, Wizard Coder) and that Hugging Face-hosted model parameters and v LLM were utilized for generation. However, it does not provide specific details on the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	Yes	We applied Code Chain to both open-sourced and closed-sourced pretrained LLMs, including Open AI s GPT3.5 and GPT4 (Koubaa, 2023), and Wizard Coder (Luo et al., 2023). ... For Wizard Coder, we utilized the Hugging Face-hosted model parameters (Wolf et al., 2019) and v LLM (Kwon et al., 2023) to generate programs. ... we chose to use Star Encoder (Li et al., 2023) to embed sampled sub-modules throughout all experiments.
Experiment Setup	Yes	To apply Code Chain, we fixed the budget in each generation/revision round to N = 20 generation samples per problem. ... We adopted a default temperature of 0.6 to generate output tokens and a max output length of 2048 tokens.