reproducibilityindex.ai

Coeditor: Leveraging Repo-level Diffs for Code Auto-editing

Authors: Jiayi Wei, Greg Durrett, Isil Dillig

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models (bringing exact-match accuracy from 34.7 up to 60.4), demonstrating the benefits of incorporating editing history for code completion. In a multi-round, multi-edit setting, we observe substantial gains by iteratively conditioning on additional user edits. We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage.
Researcher Affiliation	Collaboration	Jiayi Wei Augment Computing, Inc. jiayi@augmentcode.com Greg Durrett, Isil Dillig University of Texas at Austin {gdurrett, isil}@cs.utexas.edu
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage. Available at https://github.com/mrvplusone/Coeditor.
Open Datasets	Yes	We collect a code editing dataset from the commit histories of 1650 open-source Python projects for training and evaluation. ... We release our source code, dataset, model checkpoint, as well as a VSCode extension that supports interactive usage to foster future research. Available at https://github.com/mrvplusone/Coeditor.
Dataset Splits	Yes	We use 50 of the projects for testing and 50 for validation and use the remaining 1,550 projects for training. Table 1: General statistics of the PYCOMMITS dataset. train 1550 valid 50 test 50 projects
Hardware Specification	Yes	Training took about 5 days on a single NVIDIA Quadro RTX 8000 GPU with 48 GB memory.
Software Dependencies	No	The paper mentions 'Huggingface s Trainer implementation' and 'Adam W optimizer', but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or other libraries used.
Experiment Setup	Yes	We initialize Coeditor with the Code T5-base checkpoint (220M parameters) and train the model on our training set for 1.75 epoch, gradually increasing the model reference context size from 2048 tokens to 4096 tokens (at epoch 1) and then to 8192 tokens (at epoch 1.5). We use Huggingface s Trainer implementation and the Adam W optimizer, with a linear learning rate schedule with a starting learning rate of 2e-5 and 0.01 weight decay. We train the model with a fixed batch size of 1 and a total of 1.34 million training steps.