reproducibilityindex.ai

Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution

Authors: Xueying Ding, Lingxiao Zhao, Leman Akoglu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the ﬁrst part of this paper, we conduct the ﬁrst large-scale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HP-robust and scalable deep hyper-ensemble model called ROBOD that assembles models with varying HP conﬁgurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, state-of-the-art detection performance as compared to its modern counterparts, while taking only 2-10% of the time by the naïve hyper-ensemble with independent training.
Researcher Affiliation	Academia	Xueying Ding Carnegie Mellon University xding2@cs.cmu.edu Lingxiao Zhao Carnegie Mellon University lingxiao@cmu.edu Leman Akoglu Carnegie Mellon University lakoglu@cs.cmu.edu
Pseudocode	No	The paper describes methods and processes but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	To foster future research, we open source all code and datasets at https://github.com/xyvivian/ROBOD. ... We open-source our source code (including datasets, our method as well as SOTA methods implementations) at https: // github. com/ xyvivian/ ROBOD , to which we point in the Introduction section.
Open Datasets	Yes	Datasets. For evaluation we consider both image point datasets. As in the original papers [4, 35], we use MNIST and CIFAR10 to construct OD tasks... We conduct experiments on 5 image datasets from MNIST and CIFAR10, as well as 3 tabular datasets from the ODDS repository.5 (footnote 5: http://odds.cs.stonybrook.edu/)
Dataset Splits	No	The paper describes 'Clean (inlier only) data, and tested on a disjoint test dataset' and 'transductive setting where the train data is the same as the test data', implying training and testing sets. However, it does not explicitly specify a separate 'validation' dataset split with percentages or counts for its own experiments. It mentions 'labeled validation' as something other works do.
Hardware Specification	Yes	All models are trained on a NVIDIA RTX A6000 GPUs server.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	Conﬁgurations. The baselines exhibit 2-8 HPs, per which we deﬁne a small grid of values (See Appx. A.5 Table 10). We report the expected AUROC performance, i.e., averaged across all conﬁgurations in the grid, along with the standard deviation. For ROBOD (and variants) we set L=6, and K=8; the other HP conﬁg.s are listed in Appx. Table 11.