Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection

Authors: Tue Le, Tuan Nguyen, Trung Le, Dinh Phung, Paul Montague, Olivier De Vel, Lizhen Qu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to compare and contrast our proposed methods with the baselines, and the results indicate that our proposed methods outperform the baselines in all performance measures of interest.
Researcher Affiliation Collaboration Tue Le , Tuan Nguyen AI Research Lab, Trusting Social, Australia {tue.le, tuan.nguyen}@trustingsocial.com Trung Le, Dinh Phung Monash University, Australia {trunglm, dinh.phung}@monash.edu Paul Montague, Olivier De Vel Defence Science and Technology Group, Department of Defence, Australia {paul.montague, olivier.devel}@dst.defence.gov.au Lizhen Qu Data61, CSIRO, Australia lizhen.qu@data60.csiro.au
Pseudocode No No pseudocode or explicit algorithm blocks found.
Open Source Code Yes The source code, as well as the dataset, is available in our Git Hub repository4. 4https://github.com/dascimal-org/MDSeq VAE
Open Datasets Yes One of our most significant contributions is to create a labeled dataset for use in binary code vulnerability detection. ... The source code, as well as the dataset, is available in our Git Hub repository4. 4https://github.com/dascimal-org/MDSeq VAE
Dataset Splits Yes We split the data into 80% for training, 10% for validation, and the remaining 10% for testing.
Hardware Specification Yes We ran our experiments on a computer with an Intel Xeon Processor E5-1660 which had 8 cores at 3.0 GHz and 128 GB of RAM.
Software Dependencies No We implemented our proposed method in Python using Tensorflow (Abadi et al., 2016), an open-source software library for Machine Intelligence developed by the Google Brain Team.
Experiment Setup Yes For the RNN baselines and our models, the size of hidden unit was set to 256. For our model, the size of the latent space was set to 4,096, the trade-off parameters α, β were set to 2 10 2 and 10 4 respectively. We used the Adam optimizer (Kingma & Ba, 2014) with an initial learning rate equal to 0.0001. The minibatch size was set to 64 and the number of epochs was set to 100.