Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

Authors: Michihiro Yasunaga, Percy Liang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed approach on two applications: correcting introductory programming assignments (Deep Fix dataset) and correcting the outputs of program synthesis (SPo C dataset). Our final system, Dr Repair, significantly outperforms prior work, achieving 68.2% full repair rate on Deep Fix (+22.9% over the prior best), and 48.4% synthesis success rate on SPo C (+3.7% over the prior best).
Researcher Affiliation Academia Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA. Correspondence to: Michihiro Yasunaga <myasu@cs.stanford.edu>.
Pseudocode No The paper discusses the use of 'pseudocode' as part of the SPo C dataset task (translating pseudocode into C++ implementation), but it does not include any pseudocode or algorithm blocks for its own proposed methods or procedures.
Open Source Code Yes All code and data are available at https://github.com/michiyasunaga/Dr Repair.
Open Datasets Yes We evaluate the efficacy of our proposed approach on two applications, using publicly available datasets: a) Correcting introductory programming assignments. We use Deep Fix dataset (Gupta et al., 2017), where the task is to repair broken C programs submitted by students. b) Correcting the output code in program synthesis. We use the SPo C dataset (Kulal et al., 2019), where the task is to translate pseudocode into C++ implementation, and programs synthesized by prior models (seq2seq) often fail to compile.
Dataset Splits Yes We use Test P / Test W for the final evaluation of program synthesis, and use Train/Dev to train or validate our repair model. We follow the data splits in Kulal et al. (2019), which consists of Train, Dev, Test P, and Test W.
Hardware Specification Yes The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X).
Software Dependencies No The paper mentions optimizers (Adam) and neural network architectures (LSTMs, graph attention networks) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes We set the dimension of input token embeddings and position embeddings to be 200 and 100. The LSTMs and graph attention networks have a state size of 200. We use 3, 2, 1 and 2 layers for LSTM (1), graph attention net, LSTM (2) and LSTM (3), respectively, with dropout rate 0.3 applied to each layer (Srivastava et al., 2014). The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X).