Linear-Time Outlier Detection via Sensitivity
Authors: Mario Lucic, Olivier Bachem, Andreas Krause
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an extensive experimental evaluation, we demonstrate the effectiveness and establish the statistical significance of the proposed approach. In particular, it outperforms the most popular distance-based approaches while being several orders of magnitude faster. |
| Researcher Affiliation | Academia | Mario Lucic ETH Zurich lucic@inf.ethz.ch Olivier Bachem ETH Zurich olivier.bachem@inf.ethz.ch Andreas Krause ETH Zurich krausea@ethz.ch |
| Pseudocode | Yes | Algorithm 1 INFLUENCE and Algorithm 2 DISTRIBUTED INFLUENCE are provided. |
| Open Source Code | No | The paper mentions implementation details in a footnote: 'The algorithms are implemented in Python 2.7 using Num Py and Sci Py libraries and Cython for performance critical operations.' However, it does not state that the code is publicly available or provide a link. |
| Open Datasets | Yes | The experimental evaluation is applied on a variety of real-world data sets available on UCI [Asuncion and Newman, 2007] as well as on synthetic data sets. ... The relevant information is summarized in Table 2. |
| Dataset Splits | No | The paper mentions evaluation using AUPRC but does not explicitly provide details about train/validation/test splits (e.g., percentages or specific files) for the datasets used in experiments. |
| Hardware Specification | Yes | The experiments were ran on Intel Xeon 3.3GHz machine with 36 cores and 1.5TB of RAM. |
| Software Dependencies | Yes | The algorithms are implemented in Python 2.7 using Num Py and Sci Py libraries and Cython for performance critical operations. |
| Experiment Setup | Yes | Parameters. We follow the parameter settings commonly used or suggested by the authors. For KNN and LOF we set k = 10 and k = 5, respectively [Bay and Schwabacher, 2003; Bhaduri et al., 2011; Orair et al., 2010]. For both ONE-TIME SAMPLING and ITERATIVE SAMPLING we set s = 20 and additionally k = 5 for ITERATIVE SAMPLING [Sugiyama and Borgwardt, 2013]. As our proposal, we apply Algorithm 1 with model averaging and k 2 [15 i=1{500/i}. For each algorithm with a random selection process we average 30 runs and we present the mean and variance of the AUPRC score. |