Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Progressive Ensemble Distillation: Building Ensembles for Efficient Inference
Authors: Don Dennis, Abhishek Shetty, Anish Prasad Sevekari, Kazuhito Koishida, Virginia Smith
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of B-DISTIL by decomposing pretrained models across standard image, speech, and sensor datasets. We empirically evaluate our algorithm on synthetic and real-world classification tasks from computer vision, speech, and sensor processing with models suitable for the respective domains. |
| Researcher Affiliation | Collaboration | Don Kurian Dennis Carnegie Mellon University Abhishek Shetty University of California, Berkeley Anish Sevekari Carnegie Mellon University Kazuhito Koishida Microsoft Virginia Smith Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 B-DISTIL: Main algorithm; Algorithm 2 FIND-WL |
| Open Source Code | Yes | Our code can be found at: github.com/metastable B/bdistil. |
| Open Datasets | Yes | Our image classification experiments use the CIFAR-10, CIFAR-100, Tiny Image Net and Image Net datasets. For time-series classification tasks we use the Google-13 speech commands dataset. Finally, we use the daily sports activities (DSA) dataset for experiments with sensor data. |
| Dataset Splits | Yes | Except for the pretrained Res Net models, all other teacher models are selected based on performance on validation data. Dataset Train-samples Test/Val-samples Num.-labels Source CIFAR-10 50000 10000 10 [29] CIFAR-100 50000 10000 100 [29] DSA-19 6800 2280 19 [14] Google-13 52886 6835 13 [43] Image Net-1k 1281167 50000 1000 [37] Tiny Image Net-200 100000 10000 200 [30] |
| Hardware Specification | Yes | For simplicity of presentation, we convert these to the corresponding inference times (τ) on a reference accelerator (NVIDIA 3090Ti). |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'torch.autograd.profiler module', but does not specify exact version numbers for these or any other key software dependencies required for reproducibility. |
| Experiment Setup | Yes | For experiments on CIFAR100 and CIFAR10, we use a learning rate of 0.1, a momentum paramter of 0.9, and weight decay of 5 10 4. We train for 200 epochs and reduce the learning rate by a factor of 0.2 in after 30%, 60% and 90% of the epoch execution. We perform a 4-GPU data-parallel training for Image Net with a per-gpu batch size of 256, learning rate 0.1, momentum 0.9, regularization γ of 1.0, and a weight decaur of 1e 4. We train for 90 epochs with and discount the learning rate by a factor of 0.1 at 30% and 60% epochs. For experiments with time series data, Google-13 and DSA-19, we use a fixed learning rate of 0.05 and a momentum of 0.9. We do not use weight decay or learning rate scheduling for time-series data. |