Linear-Time Outlier Detection via Sensitivity

Authors: Mario Lucic, Olivier Bachem, Andreas Krause

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an extensive experimental evaluation, we demonstrate the effectiveness and establish the statistical significance of the proposed approach. In particular, it outperforms the most popular distance-based approaches while being several orders of magnitude faster.
Researcher Affiliation Academia Mario Lucic ETH Zurich lucic@inf.ethz.ch Olivier Bachem ETH Zurich olivier.bachem@inf.ethz.ch Andreas Krause ETH Zurich krausea@ethz.ch
Pseudocode Yes Algorithm 1 INFLUENCE and Algorithm 2 DISTRIBUTED INFLUENCE are provided.
Open Source Code No The paper mentions implementation details in a footnote: 'The algorithms are implemented in Python 2.7 using Num Py and Sci Py libraries and Cython for performance critical operations.' However, it does not state that the code is publicly available or provide a link.
Open Datasets Yes The experimental evaluation is applied on a variety of real-world data sets available on UCI [Asuncion and Newman, 2007] as well as on synthetic data sets. ... The relevant information is summarized in Table 2.
Dataset Splits No The paper mentions evaluation using AUPRC but does not explicitly provide details about train/validation/test splits (e.g., percentages or specific files) for the datasets used in experiments.
Hardware Specification Yes The experiments were ran on Intel Xeon 3.3GHz machine with 36 cores and 1.5TB of RAM.
Software Dependencies Yes The algorithms are implemented in Python 2.7 using Num Py and Sci Py libraries and Cython for performance critical operations.
Experiment Setup Yes Parameters. We follow the parameter settings commonly used or suggested by the authors. For KNN and LOF we set k = 10 and k = 5, respectively [Bay and Schwabacher, 2003; Bhaduri et al., 2011; Orair et al., 2010]. For both ONE-TIME SAMPLING and ITERATIVE SAMPLING we set s = 20 and additionally k = 5 for ITERATIVE SAMPLING [Sugiyama and Borgwardt, 2013]. As our proposal, we apply Algorithm 1 with model averaging and k 2 [15 i=1{500/i}. For each algorithm with a random selection process we average 30 runs and we present the mean and variance of the AUPRC score.