reproducibilityindex.ai

Actor Critic Deep Reinforcement Learning for Neural Malware Control

Authors: Yu Wang, Jack Stokes, Mady Marinescu1005-1012

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a new DRL-based system which instead employs a modiﬁed actor critic (AC) framework for the emulation halting task. This AC model dynamically predicts the best time to halt the ﬁle s execution based on a sequence of system API calls. Compared to the earlier models, the new model is capable of handling adversarial attacks by simulating their behaviors using the critic model. The new AC model demonstrates much better performance than both the DQN model and antimalware engine s heuristics. In terms of execution speed (evaluated by the halting decision), the new model halts the execution of unknown ﬁles by up to 2.5% earlier than the DQN model and 93.6% earlier than the heuristics. For the task of detecting malicious ﬁles, the proposed AC model increases the true positive rate by 9.9% from 69.5% to 76.4% at a false positive rate of 1% compared to the DQN model, and by 83.4% from 41.2% to 76.4% at a false positive rate of 1% compared to a recently proposed LSTM model.
Researcher Affiliation	Industry	Yu Wang, Jack W. Stokes, Mady Marinescu Microsoft Corporation One Microsoft Way Redmond, WA 98052 {wany, jstokes, mady}@microsoft.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	No	The event datasets used in this study are derived from a collection of 75,000 emulation scans which is evenly split between malware and benign ﬁles. All of the ﬁles have distinct sequences. This collection is then randomly split into 50,000, 10,000 and 15,000 for training, validation, and testing, respectively. No specific link, DOI, repository name, or formal citation for public access to this dataset is provided.
Dataset Splits	Yes	This collection is then randomly split into 50,000, 10,000 and 15,000 for training, validation, and testing, respectively.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	For all experiments, the deep learning system is implemented with Keras (Chollet and others 2015) and Theano (Al-Rfou et al. 2016). The paper mentions software names but does not specify their version numbers.
Experiment Setup	Yes	The DQN model uses a deep neural network with 3 dense hidden layers of dimension 200. The last layer is a softmax layer which generates two outputs representing the expected rewards by taking the two actions. For the actor critic model, there are two networks: one is for the actor model and the other is for the critic model. Both networks also contain three dense layers with a layer size of 200. The action model is followed by a softmax layer which generates the action to perform. The minibatch size in all experiments is BRL = 50. We set μ = 50,000 for the replay memory length. For the reward function, we set the decay factor γ = 0.01 in both models.