Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Break-It-Fix-It: Unsupervised Learning for Program Repair
Authors: Michihiro Yasunaga, Percy Liang
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate BIFI on two code repair datasets: Git Hub-Python, a new dataset we introduce where the goal is to repair Python code with AST parse errors; and Deep Fix, where the goal is to repair C code with compiler errors. BIFI outperforms state-of-the-art methods, obtaining 90.5% repair accuracy on Git Hub Python (+28.5%) and 71.7% on Deep Fix (+5.6%). |
| Researcher Affiliation | Academia | Michihiro Yasunaga 1 Percy Liang 1 1Stanford University, Stanford, CA. |
| Pseudocode | No | The paper describes the algorithm steps in paragraph text and equations (Eq 3-9) but does not include formal pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code and data are available at https://github.com/michiyasunaga/bifi. |
| Open Datasets | Yes | Code and data are available at https://github.com/michiyasunaga/bifi. (for GitHub-Python) and https://bitbucket.org/iiscseal/deepfix (for Deep Fix). |
| Dataset Splits | Yes | We holdout 1% of Psynthetic as our dev set, which we use to perform early stopping. |
| Hardware Specification | Yes | on one GPU (GTX Titan X). |
| Software Dependencies | No | The paper mentions using specific algorithms like Transformer and Adam, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow libraries. |
| Experiment Setup | Yes | For the architecture of the ο¬xer and breaker, we use the encoder-decoder Transformer (Vaswani et al., 2017) with 4layers, 8attention heads, andhidden states of size 256. The model parameters are optimized by Adam (Kingma & Ba, 2015), with batch size of 20,000 tokens, learning rate 0.0001, and gradient clipping 1.0 (Pascanu et al., 2013). For generation, we use beam search with beam size 10, and keep predictions with Levenshtein edit-distance less than 5 tokens from the input. |