Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Inference Efficient Deep Ensemble Learning
Authors: Ziyue Li, Kan Ren, Yifan Yang, Xinyang Jiang, Yuqing Yang, Dongsheng Li
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments with different backbones on real-world datasets illustrate our method can bring up to 56% inference cost reduction while maintaining comparable performance to full ensemble, achieving significantly better ensemble utility than other baselines. Code and supplemental materials are available at https://seqml.github.io/irene. Experiment Experimental Setup Here we present the details of experimental setup, including datasets, backbones used, and baselines for comparison. |
| Researcher Affiliation | Industry | Microsoft Research EMAIL, EMAIL |
| Pseudocode | Yes | The overall training algorithm has been illustrated in Appendix B. |
| Open Source Code | Yes | Code and supplemental materials are available at https://seqml.github.io/irene. |
| Open Datasets | Yes | We conduct experiments on two image classification datasets, CIFAR-10 and CIFAR-100, the primary focus of neural ensemble methods (Zhang, Liu, and Yan 2020; Rame and Cord 2021). CIFAR (Krizhevsky, Hinton et al. 2009) contains 50,000 training samples and 10,000 test samples, which are labeled as 10 and 100 classes in CIFAR-10 and CIFAR-100, respectively. |
| Dataset Splits | No | The paper mentions 50,000 training samples and 10,000 test samples for CIFAR datasets, but does not explicitly provide details about a validation split or how it was derived. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or cloud instances) are mentioned for running the experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) are provided. |
| Experiment Setup | No | The paper mentions datasets, backbones (ResNet-32 and ResNet-18), and that ensemble methods use three base models (T=3), but it does not provide specific hyperparameters like learning rates, batch sizes, number of epochs, optimizer settings, or the values for the loss weights (ω1, ω2, ω3). |