reproducibilityindex.ai

A Monte Carlo Tree Search approach to Active Malware Analysis

Authors: Riccardo Sartea, Alessandro Farinelli

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our solution using clustering techniques on models generated by analyzing real malware samples. Results show that our approach learns faster than existing techniques even without any prior information on the samples.
Researcher Affiliation	Academia	Riccardo Sartea University of Verona Department of Computer Science riccardo.sartea@univr.it Alessandro Farinelli University of Verona Department of Computer Science alessandro.farinelli@univr.it
Pseudocode	Yes	Algorithm 1 Monte Carlo Analysis; Algorithm 2 Default Policy
Open Source Code	No	The paper does not provide any links to its source code or state that the code for their proposed methodology is open-source.
Open Datasets	Yes	The malware samples have been downloaded from [Xi an Jiaotong University, 2011], for a total of 40 samples, 10 for each family. http://sanddroid.xjtu.edu.cn:8080
Dataset Splits	No	The paper mentions repeating analysis and clustering 10 times and using 40 samples from different families but does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper mentions using an "Android emulator" and setting a computational limit based on "emulator boot time" and "installation of the malware sample on the guest machine," but it does not specify any hardware details like CPU, GPU, or memory of the machine running the emulator.
Software Dependencies	No	The analysis environment is based on the Cuckoo sandbox [Cuckoo Foundation, 2016], speciﬁcally modiﬁed to meet the requirements of AMA. The paper mentions Cuckoo sandbox but does not provide a specific version number for it or any other software dependencies.
Experiment Setup	Yes	We tested different game lengths from 1 to 10 (ﬁgure 3); We used Cp = 1/ 2, obtaining a good balance between exploitation of actions that are known to trigger malware responses, and exploration of actions that have unknown outcome; In our experiments we set the computational limit of MCTS to ﬁt the Android emulator boot time, plus the time for the installation of the malware sample on the guest machine (about 30s in total for each analyzer action); We applied K-Means clustering, repeating analysis and clustering 10 times, and computing results as the average in terms of purity, inverse purity and f-score w.r.t. our ground truth.