Coeditor: Leveraging Repo-level Diffs for Code Auto-editing

Authors: Jiayi Wei, Greg Durrett, Isil Dillig

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a simplified single-round, single-edit task, Coeditor significantly outperforms GPT-3.5 and SOTA open-source code completion models (bringing exact-match accuracy from 34.7 up to 60.4), demonstrating the benefits of incorporating editing history for code completion. In a multi-round, multi-edit setting, we observe substantial gains by iteratively conditioning on additional user edits. We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage.
Researcher Affiliation Collaboration Jiayi Wei Augment Computing, Inc. jiayi@augmentcode.com Greg Durrett, Isil Dillig University of Texas at Austin {gdurrett, isil}@cs.utexas.edu
Pseudocode No The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes We have open-sourced our code, data, and model weights to encourage future research and have released a VSCode extension powered by our model for interactive IDE usage. Available at https://github.com/mrvplusone/Coeditor.
Open Datasets Yes We collect a code editing dataset from the commit histories of 1650 open-source Python projects for training and evaluation. ... We release our source code, dataset, model checkpoint, as well as a VSCode extension that supports interactive usage to foster future research. Available at https://github.com/mrvplusone/Coeditor.
Dataset Splits Yes We use 50 of the projects for testing and 50 for validation and use the remaining 1,550 projects for training. Table 1: General statistics of the PYCOMMITS dataset. train 1550 valid 50 test 50 projects
Hardware Specification Yes Training took about 5 days on a single NVIDIA Quadro RTX 8000 GPU with 48 GB memory.
Software Dependencies No The paper mentions 'Huggingface s Trainer implementation' and 'Adam W optimizer', but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or other libraries used.
Experiment Setup Yes We initialize Coeditor with the Code T5-base checkpoint (220M parameters) and train the model on our training set for 1.75 epoch, gradually increasing the model reference context size from 2048 tokens to 4096 tokens (at epoch 1) and then to 8192 tokens (at epoch 1.5). We use Huggingface s Trainer implementation and the Adam W optimizer, with a linear learning rate schedule with a starting learning rate of 2e-5 and 0.01 weight decay. We train the model with a fixed batch size of 1 and a total of 1.34 million training steps.