On Distribution Shift in Learning-based Bug Detectors

Authors: Jingxuan He, Luca Beurer-Kellner, Martin Vechev

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our test set and the latest version of open source repositories.
Researcher Affiliation Academia Jingxuan He 1 Luca Beurer-Kellner 1 Martin Vechev 1 1Department of Computer Science, ETH Zurich, Switzerland.
Pseudocode No No explicitly labeled pseudocode or algorithm blocks were found.
Open Source Code Yes Our code, datasets, and models are publicly available at https://github.com/eth-sri/ learning-real-bug-detector.
Open Datasets Yes Our code, datasets, and models are publicly available at https://github.com/eth-sri/ learning-real-bug-detector.
Dataset Splits Yes Moreover, we obtain an imbalanced dataset with real bugs, which is randomly split into real-train (used for the second training phase), real-val (used as the validation set), and real-test (used as the blind test set). The split ratio is 0.5:0.25:0.25.
Hardware Specification Yes Our experiments were done on servers with NVIDIA RTX 2080 Ti and NVIDIA TITAN X GPUs.
Software Dependencies No The paper mentions using 'Hugging Face' and 'Py Torch' but does not specify their version numbers.
Experiment Setup Yes The number of epochs, learning rate, and the time cost of the two training phases are shown in Table 9. Both training phases require at most two epochs to achieve good performance, highlighting the power of pretrained models to quickly adapt to new tasks and data distributions. In each batch, we feed two samples into the model as larger batch size will cause out-of-memory errors.