Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting
Authors: Anthony Chen, Huanrui Yang, Yulu Gan, Denis A Gudovskiy, Zhen Dong, Haofan Wang, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical study shows Split Ensemble, without additional computational cost, improves accuracy over a single model by 0.8%, 1.8%, and 25.5% on CIFAR-10, CIFAR-100, and Tiny-Image Net, respectively. OOD detection for the same backbone and in-distribution datasets surpasses a single model baseline by 2.2%, 8.1%, and 29.6% in mean AUROC, respectively. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Peking University 2University of California, Berkeley 3Panasonic Holdings Corporation 4Carnegie Mellon University. |
| Pseudocode | Yes | The detailed process of Split-Ensemble training is provided in the pseudo-code in Algorithm 1 of Appendix B. |
| Open Source Code | No | The paper does not provide a statement about releasing open-source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We perform classification tasks on four popular image classification benchmarks, including CIFAR-10, CIFAR-100 (Krizhevsky, 2009), Tiny Image Net (Deng et al., 2009) and Image Net (Krizhevsky et al., 2012) datasets. |
| Dataset Splits | No | The paper mentions 'test(val) sets' and discusses training and testing phases but does not explicitly provide specific training/validation/test dataset split percentages, absolute sample counts, or explicit references to predefined splits with citations that define these proportions. |
| Hardware Specification | Yes | Our Split Ensemble model was trained over 200 epochs using a single NVIDIA A100 GPU with 80GB of memory, for experiments involving CIFAR-10, CIFAR-100, and Tiny Image Net datasets. For the larger-scale Image Net dataset, we employ 8 NVIDIA A100 GPUs, each with 80GB memory, to handle the increased computational demands. |
| Software Dependencies | No | The paper mentions using a library from (Kirchheim et al., 2022) for Gaussian and Uniform Noise generation, which is 'Pytorch-ood', but does not specify the version of PyTorch or any other core software dependencies with version numbers used for their own implementation. |
| Experiment Setup | Yes | We use an SGD optimizer with a momentum of 0.9 and weight decay of 0.0005. We also adopt a 200-epoch cosine learning rate schedule with 10 warm-up epochs and a batchsize of 256. |