Modeling the Machine Learning Multiverse
Authors: Samuel J. Bell, Onno Kampman, Jesse Dodge, Neil Lawrence
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization. |
| Researcher Affiliation | Collaboration | 1Computer Laboratory, University of Cambridge 2Department of Psychology, University of Cambridge 3Allen Institute for AI |
| Pseudocode | No | The paper describes steps for efficient multiverse exploration (1. Sample an initial design, 2. Fit a GP model, 3. Use an acquisition function, 4. Repeat steps), but this is presented as a numbered list within the text, not as a formally labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] To be included in supplementary materials. |
| Open Datasets | Yes | Our evaluation function is the test accuracy of the SVM on the Wisconsin Breast Cancer Dataset [49]...trained for 300 epochs on CIFAR-10...dataset ∈ {CIFAR-10, CIFAR-100 [53], Tiny Image Net[73]}. |
| Dataset Splits | No | The paper mentions training and testing on datasets like CIFAR-10 but does not specify the explicit percentages or sample counts for training, validation, or test splits. It refers to 'test accuracy' but does not detail how the data was partitioned into these specific splits. |
| Hardware Specification | Yes | Calculated using https://mlco2.github.io/impact [77] assuming A100 GPUs on the University of Cambridge HPC cluster with carbon efficiency 0.307 kg CO2/kWh. |
| Software Dependencies | No | For GP modeling we use GPy [44] with Emu Kit [45] for experimental design and sensitivity analysis. We use Torch Vision’s [46] off-the-shelf deep learning model architectures. The paper names software but does not specify version numbers for reproducibility. |
| Experiment Setup | Yes | We set our search space to learning rate ∈ [10^-4, 10^0] by ∈ [10^-11, 10^-4]...The model is VGG-16 with batch normalization [34] and dropout [56], trained for 300 epochs on CIFAR-10...The search space includes learning rate ∈ [10^-4, 10^-1/2], batch size ∈ {2^4, ..., 2^13}, model ∈ {Alex Net [71], VGG [55], Res Net [72]}, and dataset ∈ {CIFAR-10, CIFAR-100 [53], Tiny Image Net[73]}. |