reproducibilityindex.ai

Heterogeneous Graph Matching Networks for Unknown Malware Detection

Authors: Shen Wang, Zhengzhang Chen, Xiao Yu, Ding Li, Jingchao Ni, Lu-An Tang, Jiaping Gui, Zhichun Li, Haifeng Chen, Philip S. Yu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a systematic evaluation of our model and show that it is accurate in detecting malicious program behavior and can help detect malware attacks with less false positives. Match GNet outperforms the state-of-the-art algorithms in malware detection by generating 50% less false positives while keeping zero false negatives.
Researcher Affiliation	Collaboration	1University of Illinois at Chicago, USA 2NEC Laboratories America, USA 3Tsinghua University, China
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	We collect a 20-week period of data from a real enterprise network composed of 109 hosts (87 Windows hosts and 22 Linux hosts). The sheer size of the data set is around three terabytes.
Dataset Splits	Yes	We evaluate the selection of hyper parameters of Match GNet with our validating data set (i.e., data from the sixth week). To simulate unknown program instances, we split the programs in the training data equally into two sets, the known set and the unknown set. In our ﬁve weeks training data, we exclude the programs in the unknown set and only train the model from the programs in the known set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Adam optimizer' but does not specify version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We ﬁnd that when Match GNet has 3 layers and 500 neurons, it reaches the maximal AUC. Larger hyper-parameter values may consume more resources but have little improvement on the AUC. Thus, we use the optimal hyper parameters as a part of the default model and apply them to the other parts of our experiments.