TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

Authors: Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation on a massive dataset of Java Script programs shows that TFix is practically effective: it is able to synthesize code that fixes the error in 67 percent of cases and significantly outperforms existing learning-based approaches. We conduct an extensive evaluation of TFix on a massive dataset of Java Script programs to fix 52 error types detected by ESLint.
Researcher Affiliation Collaboration Berkay Berabi 1 2 Jingxuan He 1 Veselin Raychev 1 2 Martin Vechev 1 1Department of Computer Science, ETH Zurich, Switzer land 2Snyk, Switzerland.
Pseudocode Yes We present the data extraction and cleaning procedure of TFix in Algorithm 1. (Algorithm 1 is presented in a structured block).
Open Source Code Yes 1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix.
Open Datasets Yes 1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix. We ran Algorithm 1 on 5.5 million commits ob tained from the top 500k Git Hub public repositories based on the number of stars and extracted a dataset of fixes for 52 error types detected by ESLint.
Dataset Splits Yes To a create train-test split, we randomly selected 10% of the samples for each error type as the test set (we call it clean test). The remaining was further split into 90% for fine-tuning and 10% for validation.
Hardware Specification Yes TFix was fine-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32.
Software Dependencies No The paper mentions 'T5 large' as the model and 'transformers library' for implementation, but does not provide specific version numbers for these software components.
Experiment Setup Yes For fine-tuning, we used Adam (Kingma & Ba, 2015) with the learning rate initialized to 10 4 . We set warm-up iterations to 500 and applied linear learning rate scheduling. TFix was fine-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32. The fine-tuning ran for 30 epochs, which took 3-4 days, and applied validation after each epoch. During inference, we used beam search with a beam size of five.