Actor Critic Deep Reinforcement Learning for Neural Malware Control
Authors: Yu Wang, Jack Stokes, Mady Marinescu1005-1012
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a new DRL-based system which instead employs a modified actor critic (AC) framework for the emulation halting task. This AC model dynamically predicts the best time to halt the file s execution based on a sequence of system API calls. Compared to the earlier models, the new model is capable of handling adversarial attacks by simulating their behaviors using the critic model. The new AC model demonstrates much better performance than both the DQN model and antimalware engine s heuristics. In terms of execution speed (evaluated by the halting decision), the new model halts the execution of unknown files by up to 2.5% earlier than the DQN model and 93.6% earlier than the heuristics. For the task of detecting malicious files, the proposed AC model increases the true positive rate by 9.9% from 69.5% to 76.4% at a false positive rate of 1% compared to the DQN model, and by 83.4% from 41.2% to 76.4% at a false positive rate of 1% compared to a recently proposed LSTM model. |
| Researcher Affiliation | Industry | Yu Wang, Jack W. Stokes, Mady Marinescu Microsoft Corporation One Microsoft Way Redmond, WA 98052 {wany, jstokes, mady}@microsoft.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The event datasets used in this study are derived from a collection of 75,000 emulation scans which is evenly split between malware and benign files. All of the files have distinct sequences. This collection is then randomly split into 50,000, 10,000 and 15,000 for training, validation, and testing, respectively. No specific link, DOI, repository name, or formal citation for public access to this dataset is provided. |
| Dataset Splits | Yes | This collection is then randomly split into 50,000, 10,000 and 15,000 for training, validation, and testing, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | For all experiments, the deep learning system is implemented with Keras (Chollet and others 2015) and Theano (Al-Rfou et al. 2016). The paper mentions software names but does not specify their version numbers. |
| Experiment Setup | Yes | The DQN model uses a deep neural network with 3 dense hidden layers of dimension 200. The last layer is a softmax layer which generates two outputs representing the expected rewards by taking the two actions. For the actor critic model, there are two networks: one is for the actor model and the other is for the critic model. Both networks also contain three dense layers with a layer size of 200. The action model is followed by a softmax layer which generates the action to perform. The minibatch size in all experiments is BRL = 50. We set μ = 50,000 for the replay memory length. For the reward function, we set the decay factor γ = 0.01 in both models. |