Diverse Ensemble Evolution: Curriculum Data-Model Marriage

Authors: Tianyi Zhou, Shengjie Wang, Jeff A. Bilmes

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, Div E2 outperforms other ensemble training methods under a variety of model aggregation techniques, while also maintaining competitive efficiency. We apply Div E2 to four benchmark datasets, and show that it improves over randomization-based ensemble training methods on a variety of approaches to aggregate ensemble models into a single prediction.
Researcher Affiliation Academia University of Washington, Seattle {tianyizh, wangsj, bilmes}@uw.edu
Pseudocode Yes Algorithm 1 SELECTLEARN(k, p, λ, γ, {w0 i });
Open Source Code No The paper does not contain an unambiguous statement that the authors are releasing the code for the described methodology, nor does it provide a direct link to a source-code repository.
Open Datasets Yes Mobile Net V2 [56] on CIFAR10 [38]; (2) Res Net18 [29] on CIFAR100 [38]; (3) CNNs with two convolutional layers4 on Fashion-MNIST ( Fashion in all tables) [69]; (4) and lastly CNNs with six convolutional layers on STL10 [12]5.
Dataset Splits No The paper mentions training and testing but does not explicitly provide specific details about training/validation/test dataset splits (e.g., percentages, sample counts, or explicit cross-validation setup) needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We everywhere fix the number of models at m = 10, and use 2 parameter regularization on w with weight 1 10 4. In Div E2 s training phase, we start from k = 6, p = n/2m and linearly change to k = 1, p = 3n/m in T = 200 episodes.