Self-Supervised Bug Detection and Repair
Authors: Miltiadis Allamanis, Henry Jackson-Flux, Marc Brockschmidt
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A Python implementation of BUGLAB improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software. |
| Researcher Affiliation | Industry | Miltiadis Allamanis, Henry Jackson-Flux , Marc Brockschmidt Microsoft Research, Cambridge, UK {miallama, mabrocks}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Sequential Training Procedure for Selector and Detector models |
| Open Source Code | Yes | We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair. |
| Open Datasets | Yes | We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair. We publish the dataset at https://www.microsoft.com/en-us/download/103554 and include it in the supplementary material. |
| Dataset Splits | No | For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a fixed number of 300 epochs (with 200k training samples per epoch) for the bug detector3 and the last detector model is used for evaluation. Although a validation set is mentioned, its size or specific split ratio is not provided in the main text. |
| Hardware Specification | Yes | This amounts to about 1.5 weeks for the GNN models and about 1 week for the GREAT models on a single P100 GPU. |
| Software Dependencies | No | Finally, we would like to thank the contributors to the following open-source tools used: Py Torch [20], Py Driller [27], Message Pack, Lib CST, Jedi, Kubernetes, Helm. Although software is mentioned, specific version numbers for these dependencies are not provided. |
| Experiment Setup | Yes | architecture details and hyperparameter choices can be found in Appx. A. For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a fixed number of 300 epochs (with 200k training samples per epoch) for the bug detector. |