On Distribution Shift in Learning-based Bug Detectors
Authors: Jingxuan He, Luca Beurer-Kellner, Martin Vechev
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our test set and the latest version of open source repositories. |
| Researcher Affiliation | Academia | Jingxuan He 1 Luca Beurer-Kellner 1 Martin Vechev 1 1Department of Computer Science, ETH Zurich, Switzerland. |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our code, datasets, and models are publicly available at https://github.com/eth-sri/ learning-real-bug-detector. |
| Open Datasets | Yes | Our code, datasets, and models are publicly available at https://github.com/eth-sri/ learning-real-bug-detector. |
| Dataset Splits | Yes | Moreover, we obtain an imbalanced dataset with real bugs, which is randomly split into real-train (used for the second training phase), real-val (used as the validation set), and real-test (used as the blind test set). The split ratio is 0.5:0.25:0.25. |
| Hardware Specification | Yes | Our experiments were done on servers with NVIDIA RTX 2080 Ti and NVIDIA TITAN X GPUs. |
| Software Dependencies | No | The paper mentions using 'Hugging Face' and 'Py Torch' but does not specify their version numbers. |
| Experiment Setup | Yes | The number of epochs, learning rate, and the time cost of the two training phases are shown in Table 9. Both training phases require at most two epochs to achieve good performance, highlighting the power of pretrained models to quickly adapt to new tasks and data distributions. In each batch, we feed two samples into the model as larger batch size will cause out-of-memory errors. |