Break-It-Fix-It: Unsupervised Learning for Program Repair
Authors: Michihiro Yasunaga, Percy Liang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BIFI on two code repair datasets: Git Hub-Python, a new dataset we introduce where the goal is to repair Python code with AST parse errors; and Deep Fix, where the goal is to repair C code with compiler errors. BIFI outperforms state-of-the-art methods, obtaining 90.5% repair accuracy on Git Hub Python (+28.5%) and 71.7% on Deep Fix (+5.6%). |
| Researcher Affiliation | Academia | Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA. |
| Pseudocode | No | The paper describes the algorithm steps in paragraph text and equations (Eq 3-9) but does not include formal pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code and data are available at https://github.com/michiyasunaga/bifi. |
| Open Datasets | Yes | Code and data are available at https://github.com/michiyasunaga/bifi. (for GitHub-Python) and https://bitbucket.org/iiscseal/deepfix (for Deep Fix). |
| Dataset Splits | Yes | We holdout 1% of Psynthetic as our dev set, which we use to perform early stopping. |
| Hardware Specification | Yes | on one GPU (GTX Titan X). |
| Software Dependencies | No | The paper mentions using specific algorithms like Transformer and Adam, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow libraries. |
| Experiment Setup | Yes | For the architecture of the fixer and breaker, we use the encoder-decoder Transformer (Vaswani et al., 2017) with 4layers, 8attention heads, andhidden states of size 256. The model parameters are optimized by Adam (Kingma & Ba, 2015), with batch size of 20,000 tokens, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2013). For generation, we use beam search with beam size 10, and keep predictions with Levenshtein edit-distance less than 5 tokens from the input. |