Neural Ensemble Search for Uncertainty Estimation and Dataset Shift
Authors: Sheheryar Zaidi, Arber Zela, Thomas Elsken, Chris C Holmes, Frank Hutter, Yee Teh
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift. Our further analysis and ablation studies provide evidence of higher ensemble diversity due to architectural variation, resulting in ensembles that can outperform deep ensembles, even when having weaker average base learners. |
| Researcher Affiliation | Collaboration | 1University of Oxford, 2University of Freiburg, 3Bosch Center for Artificial Intelligence |
| Pseudocode | Yes | Algorithm 1: NES with Regularized Evolution |
| Open Source Code | Yes | To foster reproducibility, our code is available: https://github.com/automl/nes |
| Open Datasets | Yes | We use Dtrain and Dtest for the validation and test datasets, respectively. Denote by fθ a neural network with weights θ, so fθ(x) RC is the predicted probability vector over the classes for input x. Let ℓ(fθ(x), y) be the neural network s loss for data point (x, y). |
| Dataset Splits | Yes | Let Dtrain = {(xi, yi) : i = 1, . . . , N} be the training dataset, where the input xi RD and, assuming a classification task, the output yi {1, . . . , C}. We use Dval and Dtest for the validation and test datasets, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (GPU/CPU models, processor types, memory amounts) used for running its experiments. It mentions using 'GPUs' in the context of NAS-Bench-201 but no specific models. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper mentions 'Experiment details are available in Section 5 and Appendix B' but does not explicitly state hyperparameter values or training configurations in the main text that would allow for full reproducibility without referring to external sections. |