reproducibilityindex.ai

Dynamic Malware Analysis with Feature Engineering and Feature Learning

Authors: Zhaoqi Zhang, Panpan Qi, Wei Wang1210-1217

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our solution outperforms baselines signiﬁcantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.
Researcher Affiliation	Academia	Zhaoqi Zhang, Panpan Qi, Wei Wang School of Computing National University of Singapore {zhaoqi.zhang, qipanpan}@u.nus.edu, wangwei@comp.nus.edu.sg
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	1. We propose a novel feature representation for system API arguments. The extracted features from our dataset will be released for public access. [...] 1https://github.com/joddiy/Dynamic Malware Analysis is the link of the code and the dataset.
Open Datasets	Yes	The collected data are archived by the date and we pick two months (April and May) data to conduct our experiments. All these PE ﬁles are processed by our system (as shown in Figure 1) to collect the API call sequences. Table 2 is a summary of the data, where the row represents the statistics of the data in a month. [...] 1https://github.com/joddiy/Dynamic Malware Analysis is the link of the code and the dataset.
Dataset Splits	Yes	We use 4-fold cross-validation (or CV) over the April dataset to train the models and do the testing over the May dataset.
Hardware Specification	No	The paper mentions training on a "model server with GPUs" but does not provide specific details on the GPU models, CPU models, or any other hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions using "Cuckoo" sandbox and "Windows 7 system" but does not provide specific version numbers for these or any other software dependencies like deep learning frameworks (e.g., TensorFlow, PyTorch) or programming languages.
Experiment Setup	Yes	In addition, the optimization method we take is Adam, and the learning rate is 0.001. ... ℓ(X, y) = (y log(P[Y = 1\|X]) + (1 y)log(P[Y = 0\|X])) (3) ... The number of units of each LSTM is 100. ... a dense layer with units number 64... A Re LU activation is applied to this dense layer. Then we use a dropout layer with a rate of 0.5... A Sigmoid activation is appended... All convolution layers ﬁlter size is 128, and stride is 1.