reproducibilityindex.ai

Learning to Repair Software Vulnerabilities with Generative Adversarial Networks

Authors: Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher Reale, Rebecca Russell, Louis Kim, peter chin

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We start our experiments by exploring two hand-curated datasets, namely sequences of sorted numbers and Context Free Grammar (CFG), which help highlight the beneﬁts of our proposed GAN approach to address the domain mapping problem. We then investigate the harder problem of correcting errors in C/C++ code. All of our results are given in Table 1.
Researcher Affiliation	Collaboration	Jacob A. Harer1,2, Onur Ozdemir1, Tomo Lazovich3 , Christopher P. Reale1, Rebecca L. Russell1, Louis Y. Kim1, Peter Chin2 1Draper, Cambridge, MA 2Department of Computer Science, Boston University, Boston, MA 3Lightmatter, Boston, MA {jharer,oozdemir,creale,rrussell,lkim}@draper.com, tomo@lightmatter.ai, spchin@cs.bu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	SATE IV is a dataset which contains C/C++ synthetic code examples (functions) with vulnerabilities from 116 diﬀerent Common Weakness Enumeration (CWE) classes, and was originally designed to explore performance of static and dynamic analyzers [38]. ... [38] V. Okun, A. Delaitre, and P. E. Black. Report on the Static Analysis Tool Exposition (SATE) IV. Technical report, 2013.
Dataset Splits	Yes	We use a 80/10/10% train/validation/test split.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper describes the neural network architectures and components (e.g., LSTMs, attention mechanism) but does not provide specific ancillary software details with version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9, CUDA version).
Experiment Setup	Yes	Speciﬁcally we train the generator with the loss function: LAUTO_PRE(G) = Ey P (y) [ y log(G(ˆy))] (9) where ˆy is the noisy version of the input created by dropping tokens in y with probability 0.2 and randomly inserting and deleting n tokens, where n is 0.03 times the sequence length. These numbers were selected based on hyperparameter tuning.