Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble
Authors: Chenhui Xu, Fuxun Yu, Zirui Xu, Nathan Inkawhich, Xiang Chen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate the superior performance of the MC Ensemble strategy in the OOD detection task compared to both the naive Deep Ensemble method and the standalone model of comparable size. ... We evaluate Multi-Comprehensive Ensemble on two benchmarks: CIFAR Benchmark and Image Net Benchmark. |
| Researcher Affiliation | Collaboration | 1George Mason University 2Air Force Research Laboratory. Correspondence to: Xiang Chen <xchen26@gmu.edu>, First Author: Chenhui Xu <cxu21@gmu.edu>. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code for their methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | Datasets: We evaluate Multi-Comprehensive Ensemble on two benchmarks: CIFAR Benchmark and Image Net Benchmark. In CIFAR Benchmark, CIFAR10 (Krizhevsky et al., 2009) is used as ID dataset, and SVHN (Netzer et al., 2011), i SUN (Xu et al., 2015), LSUN (Yu et al., 2015), Texture (Cimpoi et al., 2014) and Places365 (Zhou et al., 2017) are used as OOD datasets. Furthermore, CIFAR100 (Krizhevsky et al., 2009) is also tested as OOD to evaluate near OOD performance. In Image Net Benchmark, Imagenet-1K (Deng et al., 2009) is used as the ID dataset, and Places365 (Zhou et al., 2017), SUN (Xiao et al., 2010), Texture (Cimpoi et al., 2014) and i Naturalist (Van Horn et al., 2018) are used as OOD datasets. |
| Dataset Splits | No | The paper lists datasets used (e.g., CIFAR10, Imagenet-1K) for ID data, which implies they are used for training. However, it does not explicitly provide the training/validation/test split percentages or sample counts, nor does it explicitly state that standard predefined splits are used for these datasets. |
| Hardware Specification | Yes | We use 4 NVIDIA A100s for model training. ... utilized Nvidia MIG technology to let these models run simultaneously on the same A100 GPU |
| Software Dependencies | No | The paper mentions using models from 'torchvision' (Paszke et al., 2019) and following original settings of 'Chen et al., 2020' and 'Khosla et al., 2020' which implies reliance on specific software libraries. However, it does not provide specific version numbers for software dependencies like PyTorch, Python, or other libraries used for implementation. |
| Experiment Setup | Yes | Training details: We use Res Net-18 as the backbone of individual models for CIFAR10 benchmark. The number of individual models M in the ensemble is set to 3. We train Sup CE model with the cross-entropy loss with SGD for 500 epochs, with a batch size of 512. The learning rate starts at 0.5 with a cosine annealing schedule (Loshchilov & Hutter, 2017). |