reproducibilityindex.ai

Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

Authors: Michihiro Yasunaga, Percy Liang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed approach on two applications: correcting introductory programming assignments (Deep Fix dataset) and correcting the outputs of program synthesis (SPo C dataset). Our ﬁnal system, Dr Repair, signiﬁcantly outperforms prior work, achieving 68.2% full repair rate on Deep Fix (+22.9% over the prior best), and 48.4% synthesis success rate on SPo C (+3.7% over the prior best).
Researcher Affiliation	Academia	Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA. Correspondence to: Michihiro Yasunaga <myasu@cs.stanford.edu>.
Pseudocode	No	The paper discusses the use of 'pseudocode' as part of the SPo C dataset task (translating pseudocode into C++ implementation), but it does not include any pseudocode or algorithm blocks for its own proposed methods or procedures.
Open Source Code	Yes	All code and data are available at https://github.com/michiyasunaga/Dr Repair.
Open Datasets	Yes	We evaluate the efﬁcacy of our proposed approach on two applications, using publicly available datasets: a) Correcting introductory programming assignments. We use Deep Fix dataset (Gupta et al., 2017), where the task is to repair broken C programs submitted by students. b) Correcting the output code in program synthesis. We use the SPo C dataset (Kulal et al., 2019), where the task is to translate pseudocode into C++ implementation, and programs synthesized by prior models (seq2seq) often fail to compile.
Dataset Splits	Yes	We use Test P / Test W for the ﬁnal evaluation of program synthesis, and use Train/Dev to train or validate our repair model. We follow the data splits in Kulal et al. (2019), which consists of Train, Dev, Test P, and Test W.
Hardware Specification	Yes	The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X).
Software Dependencies	No	The paper mentions optimizers (Adam) and neural network architectures (LSTMs, graph attention networks) but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup	Yes	We set the dimension of input token embeddings and position embeddings to be 200 and 100. The LSTMs and graph attention networks have a state size of 200. We use 3, 2, 1 and 2 layers for LSTM (1), graph attention net, LSTM (2) and LSTM (3), respectively, with dropout rate 0.3 applied to each layer (Srivastava et al., 2014). The parameters of the models are optimized by Adam (Kingma & Ba, 2015), with batch size 25, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2012), on a GPU (GTX Titan X).