Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

Authors: Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical study shows Split Ensemble, without additional computational cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-Image Net, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by 2.2%, 8.1%, and 29.6% in mean AUROC, respectively.
Researcher Affiliation Collaboration 1School of Computer Science, Peking University 2University of California, Berkeley 3Panasonic Holdings Corporation 4Carnegie Mellon University.
Pseudocode Yes The detailed process of Split-Ensemble training is provided in the pseudo-code in Algorithm 1 of Appendix B.
Open Source Code No The paper does not provide a statement about releasing open-source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes We perform classification tasks on four popular image classification benchmarks, including CIFAR-10, CIFAR-100 (Krizhevsky, 2009), Tiny Image Net (Deng et al., 2009) and Image Net (Krizhevsky et al., 2012) datasets.
Dataset Splits No The paper mentions 'test(val) sets' and discusses training and testing phases but does not explicitly provide specific training/validation/test dataset split percentages, absolute sample counts, or explicit references to predefined splits with citations that define these proportions.
Hardware Specification Yes Our Split Ensemble model was trained over 200 epochs using a single NVIDIA A100 GPU with 80GB of memory, for experiments involving CIFAR-10, CIFAR-100, and Tiny Image Net datasets. For the larger-scale Image Net dataset, we employ 8 NVIDIA A100 GPUs, each with 80GB memory, to handle the increased computational demands.
Software Dependencies No The paper mentions using a library from (Kirchheim et al., 2022) for Gaussian and Uniform Noise generation, which is 'Pytorch-ood', but does not specify the version of PyTorch or any other core software dependencies with version numbers used for their own implementation.
Experiment Setup Yes We use an SGD optimizer with a momentum of 0.9 and weight decay of 0.0005. We also adopt a 200-epoch cosine learning rate schedule with 10 warm-up epochs and a batchsize of 256.