Self-Supervised Bug Detection and Repair

Authors: Miltiadis Allamanis, Henry Jackson-Flux, Marc Brockschmidt

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A Python implementation of BUGLAB improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.
Researcher Affiliation Industry Miltiadis Allamanis, Henry Jackson-Flux , Marc Brockschmidt Microsoft Research, Cambridge, UK {miallama, mabrocks}@microsoft.com
Pseudocode Yes Algorithm 1 Sequential Training Procedure for Selector and Detector models
Open Source Code Yes We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair.
Open Datasets Yes We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair. We publish the dataset at https://www.microsoft.com/en-us/download/103554 and include it in the supplementary material.
Dataset Splits No For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a fixed number of 300 epochs (with 200k training samples per epoch) for the bug detector3 and the last detector model is used for evaluation. Although a validation set is mentioned, its size or specific split ratio is not provided in the main text.
Hardware Specification Yes This amounts to about 1.5 weeks for the GNN models and about 1 week for the GREAT models on a single P100 GPU.
Software Dependencies No Finally, we would like to thank the contributors to the following open-source tools used: Py Torch [20], Py Driller [27], Message Pack, Lib CST, Jedi, Kubernetes, Helm. Although software is mentioned, specific version numbers for these dependencies are not provided.
Experiment Setup Yes architecture details and hyperparameter choices can be found in Appx. A. For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a fixed number of 300 epochs (with 200k training samples per epoch) for the bug detector.