Open Category Detection with PAC Guarantees
Authors: Si Liu, Risheek Garrepalli, Thomas Dietterich, Alan Fern, Dan Hendrycks
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements. We carry out experiments1 on synthetic and benchmark datasets using a state-of-the-art anomaly detector, the Isolation Forest (Liu et al., 2008). |
| Researcher Affiliation | Academia | 1Department of Statistics, Oregon State University, Oregon, USA 2School of EECS, Oregon State University, Oregon, USA 3University of California, Berkeley, California, USA. |
| Pseudocode | Yes | Algorithm 1 1: Get anomaly scores for all points in S0 and Sm, denoted x1, x2, . . . , xk and y1, y2, . . . , yn respectively. 2: Compute empirical CDFs ˆF0 and ˆFm. 3: Calculate ˆFa using equation 1. 4: Output detection threshold ˆτq = max{u S : ˆFa(u) q}, where S = {x1, x2, . . . , xk, y1, y2, . . . , yn}. |
| Open Source Code | Yes | Code for reproducing our experiments can be found at https://github.com/liusi2019/ocd. |
| Open Datasets | Yes | Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements. We performed experiments on six UCI multiclass datasets: Landsat, Opt.digits, pageb, Shuttle, Covertype and MNIST. In addition to these, we provide results for the Tiny Image Net dataset. |
| Dataset Splits | Yes | After computing the anomaly scores for both nominal and mixture datasets, we applied Algorithm 1 within a 10-fold cross validation. We divide the mixture data points at random into 10 groups. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions specific anomaly detectors like 'Isolation Forest' and 'LODA' but does not provide version numbers for them or any other software dependencies. |
| Experiment Setup | Yes | The Isolation Forest algorithm computes 1000 full depth isolation trees on the nominal data. Each tree is grown on a randomly-selected 20% subsample of the clean data points. We fixed the target quantile to be q = 0.05. |