reproducibilityindex.ai

Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Authors: Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁrst show through extensive simulations that NN methods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a set of benchmark synthetic datasets. We further consider the performance of NN methods on real datasets, and relate it to the dimensionality of the problem. Next, we analyze the theoretical properties of NN-methods for anomaly detection by studying a more general quantity called distance-to-measure (DTM), originally developed in the literature on robust geometric and topological inference.
Researcher Affiliation	Academia	1Department of Statistics and Data Science, Carnegie Mellon University 2Heinz College of Information Systems and Public Policy, Carnegie Mellon University {xgu1,lakoglu}@andrew.cmu.edu, arinaldo@cmu.edu
Pseudocode	No	The paper describes the methods verbally and mathematically but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for all our experiments are publicly available2. 2https://github.com/xgu1/DTM
Open Datasets	Yes	Next, we compare the performance of IForest, LODA, LOF, DTM2, k NN and kth NN on 23 real datasets from the ODDS library [25]. [25] Shebuti Rayana. ODDS library. http://odds.cs.stonybrook.edu, 2016. We consider six high dimensional real datasets from the UCI library [26] (see [12] for details) [26] A. Frank and A. Asuncion. Uci machine learning repository. http://archive.ics.uci. edu/ml, 2010.
Dataset Splits	No	The paper mentions evaluating methods on benchmark datasets and real datasets, but it does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or a detailed splitting methodology).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions various algorithms and methods but does not specify any software dependencies (e.g., libraries, frameworks, or programming languages) with version numbers.
Experiment Setup	Yes	For all our experiments, we set the following hyperparameters for our models: sub-sampling size = 256 and the number of trees = 100 for IForest; k = 0.03 (sample size) for all distance based methods for comparable results; for LODA, we use 100 projections with each projection using approximately d features.