Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Midas: Microcluster-Based Detector of Anomalies in Edge Streams
Authors: Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos3242-3249
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that MIDAS outperforms baseline approaches by 46%-52% accuracy (in terms of AUC), and processes the data 108 505 times faster than baseline approaches. |
| Researcher Affiliation | Academia | 1National University of Singapore, 2Carnegie Mellon University, 3KAIST EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: MIDAS: Streaming Anomaly Scoring; Algorithm 2: MIDAS-R: Incorporating Relations |
| Open Source Code | Yes | Reproducibility: Our code and datasets are publicly available at https://github.com/bhatiasiddharth/MIDAS. |
| Open Datasets | Yes | Datasets: DARPA (Lippmann et al. 1999) has 4.5M IPIP communications... Twitter Security (Rayana and Akoglu 2015; 2016)... Twitter World Cup (Rayana and Akoglu 2015; 2016)... |
| Dataset Splits | No | No explicit training, validation, and test dataset splits (e.g., percentages or absolute counts) are mentioned. The paper describes using datasets with ground truth for evaluation but not typical model training splits. |
| Hardware Specification | Yes | All experiments are carried out on a 2.7GHz Intel Core i5 processor, 16GB RAM, running OS X 10.14.6. |
| Software Dependencies | No | The paper mentions 'We implement MIDAS and MIDAS-R in C++' but does not specify the version of the compiler or any libraries used. It also mentions using 'an open-sourced implementation of SEDANSPOT' without providing its version. |
| Experiment Setup | Yes | We use 2 hash functions for the CMS data structures, and we set the number of CMS buckets to 2719 to result in an approximation error of ν = 0.001. For MIDAS-R, we set the temporal decay factor α as 0.5. We used an open-sourced implementation of SEDANSPOT, provided by the authors, following parameter settings as suggested in the original paper (sample size 500). |