Further Analysis of Outlier Detection with Deep Generative Models
Authors: Ziyu Wang, Bin Dai, David Wipf, Jun Zhu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present a possible explanation for this phenomenon, starting from the observation that a model s typical set and high-density region may not conincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihood-based outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of low-level texture versus high-level semantics in differentiating outliers. In this section we evaluate the proposed test, with the goal of better understanding the previous findings in [3]. We consider three implementations of our white noise test, which use different sequences to compute the test statistics (1): |
| Researcher Affiliation | Collaboration | Ziyu Wang1,2, Bin Dai3, David Wipf4 and Jun Zhu1,2 1 Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University, Beijing, China 2Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University 3Samsung Research China, Beijing, China 4AWS AI Lab, Shanghai, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for the experiments is available at https://github.com/thu-ml/ood-dgm. |
| Open Datasets | Yes | We use CIFAR-10, Celeb A, and Tiny Image Net images as inliers, and CIFAR10, Celeb A and SVHN images as outliers. Note that both CIFAR datasets have been created from the 80 Million Tiny Images dataset [21]. |
| Dataset Splits | No | The paper mentions using 'inlier test data' and 'outlier datasets' but does not specify explicit train/validation/test splits with percentages or sample counts. It refers to 'inlier test data' as the evaluation set, without detailing how it was partitioned from a larger training set or if a separate validation set was used for model tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments. It only vaguely mentions 'within the limit of computational resources we have' without further elaboration. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions) needed to replicate the experiment. |
| Experiment Setup | No | The paper refers to using 'the setups from the paper' for pretrained models or training 'under the same setup as in the original papers' for other DGMs and VAEs. While it mentions varying 'nz' for VAEs (e.g., 'nz = 64' or 'nz = 512' in Table 1), it does not provide comprehensive details on concrete hyperparameters such as learning rates, batch sizes, optimizers, or training epochs directly within the main text. |