Learning to Repair Software Vulnerabilities with Generative Adversarial Networks
Authors: Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, Louis Kim, peter chin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We start our experiments by exploring two hand-curated datasets, namely sequences of sorted numbers and Context Free Grammar (CFG), which help highlight the benefits of our proposed GAN approach to address the domain mapping problem. We then investigate the harder problem of correcting errors in C/C++ code. All of our results are given in Table 1. |
| Researcher Affiliation | Collaboration | Jacob A. Harer1,2, Onur Ozdemir1, Tomo Lazovich3 , Christopher P. Reale1, Rebecca L. Russell1, Louis Y. Kim1, Peter Chin2 1Draper, Cambridge, MA 2Department of Computer Science, Boston University, Boston, MA 3Lightmatter, Boston, MA {jharer,oozdemir,creale,rrussell,lkim}@draper.com, tomo@lightmatter.ai, spchin@cs.bu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | SATE IV is a dataset which contains C/C++ synthetic code examples (functions) with vulnerabilities from 116 different Common Weakness Enumeration (CWE) classes, and was originally designed to explore performance of static and dynamic analyzers [38]. ... [38] V. Okun, A. Delaitre, and P. E. Black. Report on the Static Analysis Tool Exposition (SATE) IV. Technical report, 2013. |
| Dataset Splits | Yes | We use a 80/10/10% train/validation/test split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper describes the neural network architectures and components (e.g., LSTMs, attention mechanism) but does not provide specific ancillary software details with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9, CUDA version). |
| Experiment Setup | Yes | Specifically we train the generator with the loss function: LAUTO_PRE(G) = Ey P (y) [ y log(G(ˆy))] (9) where ˆy is the noisy version of the input created by dropping tokens in y with probability 0.2 and randomly inserting and deleting n tokens, where n is 0.03 times the sequence length. These numbers were selected based on hyperparameter tuning. |