TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
Authors: Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation on a massive dataset of Java Script programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in 67 percent of cases and significantly outperforms existing learning-based approaches. We conduct an extensive evaluation of TFix on a massive dataset of Java Script programs to fix 52 error types detected by ESLint. |
| Researcher Affiliation | Collaboration | Berkay Berabi 1 2 Jingxuan He 1 Veselin Raychev 1 2 Martin Vechev 1 1Department of Computer Science, ETH Zurich, Switzer land 2Snyk, Switzerland. |
| Pseudocode | Yes | We present the data extraction and cleaning procedure of TFix in Algorithm 1. (Algorithm 1 is presented in a structured block). |
| Open Source Code | Yes | 1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix. |
| Open Datasets | Yes | 1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix. We ran Algorithm 1 on 5.5 million commits ob tained from the top 500k Git Hub public repositories based on the number of stars and extracted a dataset of fixes for 52 error types detected by ESLint. |
| Dataset Splits | Yes | To a create train-test split, we randomly selected 10% of the samples for each error type as the test set (we call it clean test). The remaining was further split into 90% for fine-tuning and 10% for validation. |
| Hardware Specification | Yes | TFix was fine-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32. |
| Software Dependencies | No | The paper mentions 'T5 large' as the model and 'transformers library' for implementation, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | For fine-tuning, we used Adam (Kingma & Ba, 2015) with the learning rate initialized to 10 4 . We set warm-up iterations to 500 and applied linear learning rate scheduling. TFix was fine-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32. The fine-tuning ran for 30 epochs, which took 3-4 days, and applied validation after each epoch. During inference, we used beam search with a beam size of five. |