Break-It-Fix-It: Unsupervised Learning for Program Repair

Authors: Michihiro Yasunaga, Percy Liang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate BIFI on two code repair datasets: Git Hub-Python, a new dataset we introduce where the goal is to repair Python code with AST parse errors; and Deep Fix, where the goal is to repair C code with compiler errors. BIFI outperforms state-of-the-art methods, obtaining 90.5% repair accuracy on Git Hub Python (+28.5%) and 71.7% on Deep Fix (+5.6%).
Researcher Affiliation Academia Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA.
Pseudocode No The paper describes the algorithm steps in paragraph text and equations (Eq 3-9) but does not include formal pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Code and data are available at https://github.com/michiyasunaga/bifi.
Open Datasets Yes Code and data are available at https://github.com/michiyasunaga/bifi. (for GitHub-Python) and https://bitbucket.org/iiscseal/deepfix (for Deep Fix).
Dataset Splits Yes We holdout 1% of Psynthetic as our dev set, which we use to perform early stopping.
Hardware Specification Yes on one GPU (GTX Titan X).
Software Dependencies No The paper mentions using specific algorithms like Transformer and Adam, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow libraries.
Experiment Setup Yes For the architecture of the fixer and breaker, we use the encoder-decoder Transformer (Vaswani et al., 2017) with 4layers, 8attention heads, andhidden states of size 256. The model parameters are optimized by Adam (Kingma & Ba, 2015), with batch size of 20,000 tokens, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2013). For generation, we use beam search with beam size 10, and keep predictions with Levenshtein edit-distance less than 5 tokens from the input.