reproducibilityindex.ai

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

Authors: Zifan Song, Yudong Wang, Wenwei Zhang, Kuikun Liu, Chengqi Lyu, Demin Song, Qipeng Guo, Hang Yan, Dahua Lin, Kai Chen, Cairong Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Alchemist Coder holds a clear lead among all models of the same size (6.7B/7B) and rivals or even surpasses larger models (15B/33B/70B), showcasing the efficacy of our method in refining instruction-following capabilities and advancing the boundaries of code intelligence.
Researcher Affiliation	Collaboration	Zifan Song1,2 Yudong Wang2 Wenwei Zhang2 Kuikun Liu2 Chengqi Lyu2 Demin Song2 Qipeng Guo2 Hang Yan2 Dahua Lin2,3,4 Kai Chen2 Cairong Zhao1 1Tongji University 2Shanghai AI Laboratory 3MMLab, The Chinese University of Hong Kong 4HKGAI under Inno HK
Pseudocode	Yes	Figure 4: Detailed prompt designed for generating data-specific Alchemist Prompts. ... Figure A3: Detailed prompt designed for generating code review data.
Open Source Code	Yes	Source code and models are available at https://github.com/Intern LM/Alchemist Coder. ... We have released our code, data, and Alchemist Coder series models at https://internlm.github.io/Alchemist Coder.
Open Datasets	Yes	Our Alchemist Coder dataset (~200M tokens) comprises four types of multi-source data, encompassing open-source datasets and three types of data constructed by us. Specifically, (a) open-source datasets including Evol-Instruct-Code-80k-v1 [10], Code Exercise-Python-27k [9], and evol-codealpaca-v1 [39]
Dataset Splits	No	The paper uses various open-source datasets and constructs its own Alchemist Coder dataset, but does not explicitly provide the training/validation/test splits for this combined dataset for the purpose of hyperparameter tuning or early stopping during training. It primarily evaluates on external benchmarks like Human Eval and MBPP.
Hardware Specification	Yes	fine-tune all the base models for 2 epochs using 32 NVIDIA A100-80GB GPUs.
Software Dependencies	No	The paper mentions using Adam optimizer and generating Python code, but does not list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch x.x, CUDA x.x).
Experiment Setup	Yes	We set the initial learning rate, minimum learning rate, and optimizer warmup steps and at 1e-4, 6e-6, and 15, respectively. We use Adam optimizer [28] and choose a batch size of 2 with a sequence length of 8192.