Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wisdom of the Ensemble: Improving Consistency of Deep Learning Models

Authors: Lijing Wang, Dipanjan Ghosh, Maria Gonzalez Diaz, Ahmed Farahat, Mahbubul Alam, Chetan Gupta, Jiangzhuo Chen, Madhav Marathe

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the theory using three datasets and two state-of-the-art deep learning classifiers we also propose an efficient dynamic snapshot ensemble method and demonstrate its value. Code for our algorithm is available at https://github.com/christa60/dynens.
Researcher Affiliation Collaboration Lijing Wang University of Virignia EMAIL Dipanjan Ghosh Hitachi America Ltd. EMAIL Maria Teresa Gonzalez Diaz Hitachi America Ltd. EMAIL Ahmed Farahat Hitachi America Ltd. EMAIL Mahbubul Alam Hitachi America Ltd. EMAIL Chetan Gupta Hitachi America Ltd. EMAIL Jiangzhuo Chen University of Virignia EMAIL Madhav Marathe University of Virignia EMAIL
Pseudocode Yes Algorithm 1: Pseudocode of the dynamic snapshot ensemble (Dyn Snap)
Open Source Code Yes Code for our algorithm is available at https://github.com/christa60/dynens.
Open Datasets Yes We conduct experiments using three datasets and two state-of-the-art models. YAHOO!Answers [36] is a topic classification dataset with 10 output categories, 140K and 6K training and testing samples. CIFAR10 and CIFAR100 [23] are datasets with 10 and 100 output categories respectively, 50k and 10k color images as training and testing samples.
Dataset Splits Yes The dataset, models and hyper-parameters are shown in Table 1. Table 1: Data and Models ... Training ... Validation ... Testing
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not list specific software components with their version numbers required for reproducibility.
Experiment Setup Yes The experiment settings for Single Base models are shown in Table 1. We set m = 20 for ensemble methods, and N = 10, β = β for Dyn Snap-cyc and Dyn Snap-step, Fd(t) in Dyn Snap-step is 1e 1, 1e 2, 1e 3 at 80, 120, 160 epochs, dropout with 0.1 drop probability.