Midas: Microcluster-Based Detector of Anomalies in Edge Streams

Authors: Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos3242-3249

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that MIDAS outperforms baseline approaches by 46%-52% accuracy (in terms of AUC), and processes the data 108 505 times faster than baseline approaches.
Researcher Affiliation Academia 1National University of Singapore, 2Carnegie Mellon University, 3KAIST {siddharth, bhooi}@comp.nus.edu.sg, {minjiy, christos}@cs.cmu.edu, kijungs@kaist.ac.kr
Pseudocode Yes Algorithm 1: MIDAS: Streaming Anomaly Scoring; Algorithm 2: MIDAS-R: Incorporating Relations
Open Source Code Yes Reproducibility: Our code and datasets are publicly available at https://github.com/bhatiasiddharth/MIDAS.
Open Datasets Yes Datasets: DARPA (Lippmann et al. 1999) has 4.5M IPIP communications... Twitter Security (Rayana and Akoglu 2015; 2016)... Twitter World Cup (Rayana and Akoglu 2015; 2016)...
Dataset Splits No No explicit training, validation, and test dataset splits (e.g., percentages or absolute counts) are mentioned. The paper describes using datasets with ground truth for evaluation but not typical model training splits.
Hardware Specification Yes All experiments are carried out on a 2.7GHz Intel Core i5 processor, 16GB RAM, running OS X 10.14.6.
Software Dependencies No The paper mentions 'We implement MIDAS and MIDAS-R in C++' but does not specify the version of the compiler or any libraries used. It also mentions using 'an open-sourced implementation of SEDANSPOT' without providing its version.
Experiment Setup Yes We use 2 hash functions for the CMS data structures, and we set the number of CMS buckets to 2719 to result in an approximation error of ν = 0.001. For MIDAS-R, we set the temporal decay factor α as 0.5. We used an open-sourced implementation of SEDANSPOT, provided by the authors, following parameter settings as suggested in the original paper (sample size 500).