Collegial Ensembles

Authors: Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the following section we conduct experiments to both validate our assumptions, and evaluate our framework for efficient ensemble search. Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets.
Researcher Affiliation Industry Etai Littwin Ben Myara Sima Sabah Joshua Susskind Shuangfei Zhai Oren Golan Apple Inc. {elittwin, bmyara, sima, jsusskind, szhai, ogolan}@apple.com
Pseudocode Yes Algorithm 1: Fitting α per architecture
Open Source Code No The paper does not explicitly state that source code for the methodology is provided, nor does it include any links to a code repository.
Open Datasets Yes Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets.
Dataset Splits No The paper mentions using standard datasets like CIFAR-10/100 and ImageNet, which typically have predefined splits. However, it does not explicitly state the specific train/validation/test percentages, sample counts, or the methodology for splitting the data in the text.
Hardware Specification No The paper mentions '8 GPUs' but does not specify the model or type of GPUs (e.g., NVIDIA V100, A100) or any other specific hardware details like CPU models or memory.
Software Dependencies No The paper mentions optimizers like 'Adam optimizer' and 'SGD', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We use SGD with 0.9 momentum and a batch size of 256 on 8 GPUs (32 per GPU). The weight decay is 0.0001 and the initial learning rate 0.1. We train the models for 100 epochs and divide the learning rate by a factor of 10 at epoch 30, 60 and 90.