reproducibilityindex.ai

TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer

Authors: Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation on a massive dataset of Java Script programs shows that TFix is practically effective: it is able to synthesize code that ﬁxes the error in 67 percent of cases and signiﬁcantly outperforms existing learning-based approaches. We conduct an extensive evaluation of TFix on a massive dataset of Java Script programs to ﬁx 52 error types detected by ESLint.
Researcher Affiliation	Collaboration	Berkay Berabi 1 2 Jingxuan He 1 Veselin Raychev 1 2 Martin Vechev 1 1Department of Computer Science, ETH Zurich, Switzer land 2Snyk, Switzerland.
Pseudocode	Yes	We present the data extraction and cleaning procedure of TFix in Algorithm 1. (Algorithm 1 is presented in a structured block).
Open Source Code	Yes	1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix.
Open Datasets	Yes	1The code, trained model, and dataset can be found at https://github.com/eth-sri/TFix. We ran Algorithm 1 on 5.5 million commits ob tained from the top 500k Git Hub public repositories based on the number of stars and extracted a dataset of ﬁxes for 52 error types detected by ESLint.
Dataset Splits	Yes	To a create train-test split, we randomly selected 10% of the samples for each error type as the test set (we call it clean test). The remaining was further split into 90% for ﬁne-tuning and 10% for validation.
Hardware Specification	Yes	TFix was ﬁne-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32.
Software Dependencies	No	The paper mentions 'T5 large' as the model and 'transformers library' for implementation, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	For ﬁne-tuning, we used Adam (Kingma & Ba, 2015) with the learning rate initialized to 10 4 . We set warm-up iterations to 500 and applied linear learning rate scheduling. TFix was ﬁne-tuned on 8 GPUs (Ge Force RTX 2080 Ti) with a batch size of 32. The ﬁne-tuning ran for 30 epochs, which took 3-4 days, and applied validation after each epoch. During inference, we used beam search with a beam size of ﬁve.