reproducibilityindex.ai

Collegial Ensembles

Authors: Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following section we conduct experiments to both validate our assumptions, and evaluate our framework for efﬁcient ensemble search. Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets.
Researcher Affiliation	Industry	Etai Littwin Ben Myara Sima Sabah Joshua Susskind Shuangfei Zhai Oren Golan Apple Inc. {elittwin, bmyara, sima, jsusskind, szhai, ogolan}@apple.com
Pseudocode	Yes	Algorithm 1: Fitting α per architecture
Open Source Code	No	The paper does not explicitly state that source code for the methodology is provided, nor does it include any links to a code repository.
Open Datasets	Yes	Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets.
Dataset Splits	No	The paper mentions using standard datasets like CIFAR-10/100 and ImageNet, which typically have predefined splits. However, it does not explicitly state the specific train/validation/test percentages, sample counts, or the methodology for splitting the data in the text.
Hardware Specification	No	The paper mentions '8 GPUs' but does not specify the model or type of GPUs (e.g., NVIDIA V100, A100) or any other specific hardware details like CPU models or memory.
Software Dependencies	No	The paper mentions optimizers like 'Adam optimizer' and 'SGD', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We use SGD with 0.9 momentum and a batch size of 256 on 8 GPUs (32 per GPU). The weight decay is 0.0001 and the initial learning rate 0.1. We train the models for 100 epochs and divide the learning rate by a factor of 10 at epoch 30, 60 and 90.