reproducibilityindex.ai

Midas: Microcluster-Based Detector of Anomalies in Edge Streams

Authors: Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos3242-3249

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that MIDAS outperforms baseline approaches by 46%-52% accuracy (in terms of AUC), and processes the data 108 505 times faster than baseline approaches.
Researcher Affiliation	Academia	1National University of Singapore, 2Carnegie Mellon University, 3KAIST {siddharth, bhooi}@comp.nus.edu.sg, {minjiy, christos}@cs.cmu.edu, kijungs@kaist.ac.kr
Pseudocode	Yes	Algorithm 1: MIDAS: Streaming Anomaly Scoring; Algorithm 2: MIDAS-R: Incorporating Relations
Open Source Code	Yes	Reproducibility: Our code and datasets are publicly available at https://github.com/bhatiasiddharth/MIDAS.
Open Datasets	Yes	Datasets: DARPA (Lippmann et al. 1999) has 4.5M IPIP communications... Twitter Security (Rayana and Akoglu 2015; 2016)... Twitter World Cup (Rayana and Akoglu 2015; 2016)...
Dataset Splits	No	No explicit training, validation, and test dataset splits (e.g., percentages or absolute counts) are mentioned. The paper describes using datasets with ground truth for evaluation but not typical model training splits.
Hardware Specification	Yes	All experiments are carried out on a 2.7GHz Intel Core i5 processor, 16GB RAM, running OS X 10.14.6.
Software Dependencies	No	The paper mentions 'We implement MIDAS and MIDAS-R in C++' but does not specify the version of the compiler or any libraries used. It also mentions using 'an open-sourced implementation of SEDANSPOT' without providing its version.
Experiment Setup	Yes	We use 2 hash functions for the CMS data structures, and we set the number of CMS buckets to 2719 to result in an approximation error of ν = 0.001. For MIDAS-R, we set the temporal decay factor α as 0.5. We used an open-sourced implementation of SEDANSPOT, provided by the authors, following parameter settings as suggested in the original paper (sample size 500).