Open Category Detection with PAC Guarantees

Authors: Si Liu, Risheek Garrepalli, Thomas Dietterich, Alan Fern, Dan Hendrycks

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements. We carry out experiments1 on synthetic and benchmark datasets using a state-of-the-art anomaly detector, the Isolation Forest (Liu et al., 2008).
Researcher Affiliation Academia 1Department of Statistics, Oregon State University, Oregon, USA 2School of EECS, Oregon State University, Oregon, USA 3University of California, Berkeley, California, USA.
Pseudocode Yes Algorithm 1 1: Get anomaly scores for all points in S0 and Sm, denoted x1, x2, . . . , xk and y1, y2, . . . , yn respectively. 2: Compute empirical CDFs ˆF0 and ˆFm. 3: Calculate ˆFa using equation 1. 4: Output detection threshold ˆτq = max{u S : ˆFa(u) q}, where S = {x1, x2, . . . , xk, y1, y2, . . . , yn}.
Open Source Code Yes Code for reproducing our experiments can be found at https://github.com/liusi2019/ocd.
Open Datasets Yes Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements. We performed experiments on six UCI multiclass datasets: Landsat, Opt.digits, pageb, Shuttle, Covertype and MNIST. In addition to these, we provide results for the Tiny Image Net dataset.
Dataset Splits Yes After computing the anomaly scores for both nominal and mixture datasets, we applied Algorithm 1 within a 10-fold cross validation. We divide the mixture data points at random into 10 groups.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions specific anomaly detectors like 'Isolation Forest' and 'LODA' but does not provide version numbers for them or any other software dependencies.
Experiment Setup Yes The Isolation Forest algorithm computes 1000 full depth isolation trees on the nominal data. Each tree is grown on a randomly-selected 20% subsample of the clean data points. We fixed the target quantile to be q = 0.05.