A Monte Carlo Tree Search approach to Active Malware Analysis

Authors: Riccardo Sartea, Alessandro Farinelli

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our solution using clustering techniques on models generated by analyzing real malware samples. Results show that our approach learns faster than existing techniques even without any prior information on the samples.
Researcher Affiliation Academia Riccardo Sartea University of Verona Department of Computer Science riccardo.sartea@univr.it Alessandro Farinelli University of Verona Department of Computer Science alessandro.farinelli@univr.it
Pseudocode Yes Algorithm 1 Monte Carlo Analysis; Algorithm 2 Default Policy
Open Source Code No The paper does not provide any links to its source code or state that the code for their proposed methodology is open-source.
Open Datasets Yes The malware samples have been downloaded from [Xi an Jiaotong University, 2011], for a total of 40 samples, 10 for each family. http://sanddroid.xjtu.edu.cn:8080
Dataset Splits No The paper mentions repeating analysis and clustering 10 times and using 40 samples from different families but does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions using an "Android emulator" and setting a computational limit based on "emulator boot time" and "installation of the malware sample on the guest machine," but it does not specify any hardware details like CPU, GPU, or memory of the machine running the emulator.
Software Dependencies No The analysis environment is based on the Cuckoo sandbox [Cuckoo Foundation, 2016], specifically modified to meet the requirements of AMA. The paper mentions Cuckoo sandbox but does not provide a specific version number for it or any other software dependencies.
Experiment Setup Yes We tested different game lengths from 1 to 10 (figure 3); We used Cp = 1/ 2, obtaining a good balance between exploitation of actions that are known to trigger malware responses, and exploration of actions that have unknown outcome; In our experiments we set the computational limit of MCTS to fit the Android emulator boot time, plus the time for the installation of the malware sample on the guest machine (about 30s in total for each analyzer action); We applied K-Means clustering, repeating analysis and clustering 10 times, and computing results as the average in terms of purity, inverse purity and f-score w.r.t. our ground truth.