Calibrated Nonparametric Scan Statistics for Anomalous Pattern Detection in Graphs

Authors: Chunpai Wang, Daniel B. Neill, Feng Chen4201-4209

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both semi-synthetic and real-world datasets are demonstrated to validate the effectiveness of our proposed methods, in comparison with state-of-the-art counterparts.
Researcher Affiliation Academia 1 University at Albany SUNY 2 New York University 3 The University of Texas at Dallas cwang25@albany.edu, daniel.neill@nyu.edu, feng.chen@utdallas.edu
Pseudocode Yes More details are provided in Algorithm 1 in Appendix B.3 . The pseudocode of estimating the maximum Nα for each N under a given significance threshold α is described in Algorithm 2 in Appendix B.4 .
Open Source Code No The paper does not include any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Datasets: We use five semi-synthetic datasets from the Stanford Network Analysis Project (SNAP 1), including 1) Wiki Vote; 2) Cond Mat; 3) Twitter; 4) Slashdot; and 5) DBLP. ... 1 https://snap.stanford.edu/data/
Dataset Splits No The paper does not explicitly provide training, validation, and test dataset splits needed to reproduce the experiments. It uses existing graph structures and simulates p-values and subgraphs.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run its experiments. It mentions "250 CPUs" for some baselines, which is not specific enough for reproducibility.
Software Dependencies No The paper does not provide a reproducible description of ancillary software, as it does not list specific version numbers for key software components or libraries.
Experiment Setup No The paper describes how data is simulated and what evaluation metrics are used, but it does not provide specific hyperparameter values or system-level training settings for the methods described, as is common for experimental setup reproducibility.