reproducibilityindex.ai

DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing

Authors: Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu1044-1051

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a detailed case study to analyze the success rate and coverage improvement of the generated C programs for fuzz testing. We analyze the performance of DEEPFUZZ with three types of sampling methods as well as three types of generation strategies. In our preliminary study, we found and reported 8 bugs of GCC, all of which are actively being addressed by developers.
Researcher Affiliation	Academia	Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802, USA
Pseudocode	No	The paper includes mathematical equations for RNNs and LSTMs but does not provide any pseudocode or clearly labeled algorithm blocks for the DEEPFUZZ process or specific methods.
Open Source Code	Yes	We have released the source code3 for public dissemination. 3 https://github.com/s3team/Deep Fuzz
Open Datasets	Yes	For the training data set, we adopted the original GCC test suite where there are over 10,000 short, or small, programs that cover most of the features specified in the C11 standard. Originally, the training data set, which contains 10,000 well-formed C programs, was collected and sampled from the GCC test suites.
Dataset Splits	No	The paper describes how training sequences are formed and discusses training for a certain number of epochs, but it does not specify any explicit validation dataset or how the data is split into training/validation sets.
Hardware Specification	Yes	We trained the model for 50 epochs on a server machine with 2.90GHz Intel Xeon(R) E5-2690 CPU and 128GB memory. Acknowledgement We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies	No	The paper mentions using 'gcc' and 'gcov' for compilation and coverage collection but does not provide specific version numbers for these or any other software libraries or frameworks used to implement DEEPFUZZ.
Experiment Setup	Yes	We trained a Sequence-to-Sequence model with 2 layers and there are 512 LSTM units per layer. We set the dropout rate of 0.2. We trained the model for 50 epochs on a server machine with 2.90GHz Intel Xeon(R) E5-2690 CPU and 128GB memory.