reproducibilityindex.ai

Self-Supervised Bug Detection and Repair

Authors: Miltiadis Allamanis, Henry Jackson-Flux, Marc Brockschmidt

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A Python implementation of BUGLAB improves by up to 30% upon baseline methods on a test dataset of 2374 real-life bugs and finds 19 previously unknown bugs in open-source software.
Researcher Affiliation	Industry	Miltiadis Allamanis, Henry Jackson-Flux , Marc Brockschmidt Microsoft Research, Cambridge, UK {miallama, mabrocks}@microsoft.com
Pseudocode	Yes	Algorithm 1 Sequential Training Procedure for Selector and Detector models
Open Source Code	Yes	We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair.
Open Datasets	Yes	We release Py PIBugs and our code at https://github.com/ microsoft/neurips21-self-supervised-bug-detection-and-repair. We publish the dataset at https://www.microsoft.com/en-us/download/103554 and include it in the supplementary material.
Dataset Splits	No	For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a ﬁxed number of 300 epochs (with 200k training samples per epoch) for the bug detector3 and the last detector model is used for evaluation. Although a validation set is mentioned, its size or specific split ratio is not provided in the main text.
Hardware Specification	Yes	This amounts to about 1.5 weeks for the GNN models and about 1 week for the GREAT models on a single P100 GPU.
Software Dependencies	No	Finally, we would like to thank the contributors to the following open-source tools used: Py Torch [20], Py Driller [27], Message Pack, Lib CST, Jedi, Kubernetes, Helm. Although software is mentioned, specific version numbers for these dependencies are not provided.
Experiment Setup	Yes	architecture details and hyperparameter choices can be found in Appx. A. For the fully supervised model, we train with early stopping over a validation set; the other models are trained for a ﬁxed number of 300 epochs (with 200k training samples per epoch) for the bug detector.