Graph-based, Self-Supervised Program Repair from Diagnostic Feedback
Authors: Michihiro Yasunaga, Percy Liang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed approach on two applications: correcting introductory programming assignments (Deep Fix dataset) and correcting the outputs of program synthesis (SPo C dataset). Our final system, Dr Repair, significantly outperforms prior work, achieving 68.2% full repair rate on Deep Fix (+22.9% over the prior best), and 48.4% synthesis success rate on SPo C (+3.7% over the prior best). |
| Researcher Affiliation | Academia | Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA. Correspondence to: Michihiro Yasunaga <myasu@cs.stanford.edu>. |
| Pseudocode | No | The paper discusses the use of 'pseudocode' as part of the SPo C dataset task (translating pseudocode into C++ implementation), but it does not include any pseudocode or algorithm blocks for its own proposed methods or procedures. |
| Open Source Code | Yes | All code and data are available at https://github.com/michiyasunaga/Dr Repair. |
| Open Datasets | Yes | We evaluate the efficacy of our proposed approach on two applications, using publicly available datasets: a) Correcting introductory programming assignments. We use Deep Fix dataset (Gupta et al., 2017), where the task is to repair broken C programs submitted by students. b) Correcting the output code in program synthesis. We use the SPo C dataset (Kulal et al., 2019), where the task is to translate pseudocode into C++ implementation, and programs synthesized by prior models (seq2seq) often fail to compile. |
| Dataset Splits | Yes | We use Test P / Test W for the final evaluation of program synthesis, and use Train/Dev to train or validate our repair model. We follow the data splits in Kulal et al. (2019), which consists of Train, Dev, Test P, and Test W. |
| Hardware Specification | Yes | The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X). |
| Software Dependencies | No | The paper mentions optimizers (Adam) and neural network architectures (LSTMs, graph attention networks) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | We set the dimension of input token embeddings and position embeddings to be 200 and 100. The LSTMs and graph attention networks have a state size of 200. We use 3, 2, 1 and 2 layers for LSTM (1), graph attention net, LSTM (2) and LSTM (3), respectively, with dropout rate 0.3 applied to each layer (Srivastava et al., 2014). The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X). |