Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

Authors: Chenhui Xu, Fuxun Yu, Zirui Xu, Nathan Inkawhich, Xiang Chen

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate the superior performance of the MC Ensemble strategy in the OOD detection task compared to both the naive Deep Ensemble method and the standalone model of comparable size. ... We evaluate Multi-Comprehensive Ensemble on two benchmarks: CIFAR Benchmark and Image Net Benchmark.
Researcher Affiliation Collaboration 1George Mason University 2Air Force Research Laboratory. Correspondence to: Xiang Chen <xchen26@gmu.edu>, First Author: Chenhui Xu <cxu21@gmu.edu>.
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide an explicit statement about releasing open-source code for their methodology, nor does it include a link to a code repository.
Open Datasets Yes Datasets: We evaluate Multi-Comprehensive Ensemble on two benchmarks: CIFAR Benchmark and Image Net Benchmark. In CIFAR Benchmark, CIFAR10 (Krizhevsky et al., 2009) is used as ID dataset, and SVHN (Netzer et al., 2011), i SUN (Xu et al., 2015), LSUN (Yu et al., 2015), Texture (Cimpoi et al., 2014) and Places365 (Zhou et al., 2017) are used as OOD datasets. Furthermore, CIFAR100 (Krizhevsky et al., 2009) is also tested as OOD to evaluate near OOD performance. In Image Net Benchmark, Imagenet-1K (Deng et al., 2009) is used as the ID dataset, and Places365 (Zhou et al., 2017), SUN (Xiao et al., 2010), Texture (Cimpoi et al., 2014) and i Naturalist (Van Horn et al., 2018) are used as OOD datasets.
Dataset Splits No The paper lists datasets used (e.g., CIFAR10, Imagenet-1K) for ID data, which implies they are used for training. However, it does not explicitly provide the training/validation/test split percentages or sample counts, nor does it explicitly state that standard predefined splits are used for these datasets.
Hardware Specification Yes We use 4 NVIDIA A100s for model training. ... utilized Nvidia MIG technology to let these models run simultaneously on the same A100 GPU
Software Dependencies No The paper mentions using models from 'torchvision' (Paszke et al., 2019) and following original settings of 'Chen et al., 2020' and 'Khosla et al., 2020' which implies reliance on specific software libraries. However, it does not provide specific version numbers for software dependencies like PyTorch, Python, or other libraries used for implementation.
Experiment Setup Yes Training details: We use Res Net-18 as the backbone of individual models for CIFAR10 benchmark. The number of individual models M in the ensemble is set to 3. We train Sup CE model with the cross-entropy loss with SGD for 500 epochs, with a batch size of 512. The learning rate starts at 0.5 with a cosine annealing schedule (Loshchilov & Hutter, 2017).