Collegial Ensembles
Authors: Etai Littwin, Ben Myara, Sima Sabah, Joshua Susskind, Shuangfei Zhai, Oren Golan
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following section we conduct experiments to both validate our assumptions, and evaluate our framework for efficient ensemble search. Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets. |
| Researcher Affiliation | Industry | Etai Littwin Ben Myara Sima Sabah Joshua Susskind Shuangfei Zhai Oren Golan Apple Inc. {elittwin, bmyara, sima, jsusskind, szhai, ogolan}@apple.com |
| Pseudocode | Yes | Algorithm 1: Fitting α per architecture |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology is provided, nor does it include any links to a code repository. |
| Open Datasets | Yes | Starting with a toy example, we evaluate the effect of V ar(K) and βe on test performance using fully connected models trained on the MNIST dataset. For the latter experiments, we move to larger scale models trained on CIFAR-10/100 and the Image Net [4] datasets. |
| Dataset Splits | No | The paper mentions using standard datasets like CIFAR-10/100 and ImageNet, which typically have predefined splits. However, it does not explicitly state the specific train/validation/test percentages, sample counts, or the methodology for splitting the data in the text. |
| Hardware Specification | No | The paper mentions '8 GPUs' but does not specify the model or type of GPUs (e.g., NVIDIA V100, A100) or any other specific hardware details like CPU models or memory. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam optimizer' and 'SGD', but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We use SGD with 0.9 momentum and a batch size of 256 on 8 GPUs (32 per GPU). The weight decay is 0.0001 and the initial learning rate 0.1. We train the models for 100 epochs and divide the learning rate by a factor of 10 at epoch 30, 60 and 90. |