DeepFuzz: Automatic Generation of Syntax Valid C Programs for Fuzz Testing
Authors: Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu1044-1051
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a detailed case study to analyze the success rate and coverage improvement of the generated C programs for fuzz testing. We analyze the performance of DEEPFUZZ with three types of sampling methods as well as three types of generation strategies. In our preliminary study, we found and reported 8 bugs of GCC, all of which are actively being addressed by developers. |
| Researcher Affiliation | Academia | Xiao Liu, Xiaoting Li, Rupesh Prajapati, Dinghao Wu College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802, USA |
| Pseudocode | No | The paper includes mathematical equations for RNNs and LSTMs but does not provide any pseudocode or clearly labeled algorithm blocks for the DEEPFUZZ process or specific methods. |
| Open Source Code | Yes | We have released the source code3 for public dissemination. 3 https://github.com/s3team/Deep Fuzz |
| Open Datasets | Yes | For the training data set, we adopted the original GCC test suite where there are over 10,000 short, or small, programs that cover most of the features specified in the C11 standard. Originally, the training data set, which contains 10,000 well-formed C programs, was collected and sampled from the GCC test suites. |
| Dataset Splits | No | The paper describes how training sequences are formed and discusses training for a certain number of epochs, but it does not specify any explicit validation dataset or how the data is split into training/validation sets. |
| Hardware Specification | Yes | We trained the model for 50 epochs on a server machine with 2.90GHz Intel Xeon(R) E5-2690 CPU and 128GB memory. Acknowledgement We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research. |
| Software Dependencies | No | The paper mentions using 'gcc' and 'gcov' for compilation and coverage collection but does not provide specific version numbers for these or any other software libraries or frameworks used to implement DEEPFUZZ. |
| Experiment Setup | Yes | We trained a Sequence-to-Sequence model with 2 layers and there are 512 LSTM units per layer. We set the dropout rate of 0.2. We trained the model for 50 epochs on a server machine with 2.90GHz Intel Xeon(R) E5-2690 CPU and 128GB memory. |