Hyperparameter Sensitivity in Deep Outlier Detection: Analysis and a Scalable Hyper-Ensemble Solution

Authors: Xueying Ding, Lingxiao Zhao, Leman Akoglu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the first part of this paper, we conduct the first large-scale analysis on the HP sensitivity of deep OD methods, and through more than 35,000 trained models, quantitatively demonstrate that model selection is inevitable. Next, we design a HP-robust and scalable deep hyper-ensemble model called ROBOD that assembles models with varying HP configurations, bypassing the choice paralysis. Importantly, we introduce novel strategies to speed up ensemble training, such as parameter sharing, batch/simultaneous training, and data subsampling, that allow us to train fewer models with fewer parameters. Extensive experiments on both image and tabular datasets show that ROBOD achieves and retains robust, state-of-the-art detection performance as compared to its modern counterparts, while taking only 2-10% of the time by the naïve hyper-ensemble with independent training.
Researcher Affiliation Academia Xueying Ding Carnegie Mellon University xding2@cs.cmu.edu Lingxiao Zhao Carnegie Mellon University lingxiao@cmu.edu Leman Akoglu Carnegie Mellon University lakoglu@cs.cmu.edu
Pseudocode No The paper describes methods and processes but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes To foster future research, we open source all code and datasets at https://github.com/xyvivian/ROBOD. ... We open-source our source code (including datasets, our method as well as SOTA methods implementations) at https: // github. com/ xyvivian/ ROBOD , to which we point in the Introduction section.
Open Datasets Yes Datasets. For evaluation we consider both image point datasets. As in the original papers [4, 35], we use MNIST and CIFAR10 to construct OD tasks... We conduct experiments on 5 image datasets from MNIST and CIFAR10, as well as 3 tabular datasets from the ODDS repository.5 (footnote 5: http://odds.cs.stonybrook.edu/)
Dataset Splits No The paper describes 'Clean (inlier only) data, and tested on a disjoint test dataset' and 'transductive setting where the train data is the same as the test data', implying training and testing sets. However, it does not explicitly specify a separate 'validation' dataset split with percentages or counts for its own experiments. It mentions 'labeled validation' as something other works do.
Hardware Specification Yes All models are trained on a NVIDIA RTX A6000 GPUs server.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch, TensorFlow, etc.).
Experiment Setup Yes Configurations. The baselines exhibit 2-8 HPs, per which we define a small grid of values (See Appx. A.5 Table 10). We report the expected AUROC performance, i.e., averaged across all configurations in the grid, along with the standard deviation. For ROBOD (and variants) we set L=6, and K=8; the other HP config.s are listed in Appx. Table 11.