reproducibilityindex.ai

Modeling the Machine Learning Multiverse

Authors: Samuel J. Bell, Onno Kampman, Jesse Dodge, Neil Lawrence

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the first of two case studies, we investigate disputed claims about the relative merit of adaptive optimizers. Second, we synthesize conflicting research on the effect of learning rate on the large batch training generalization gap. Our framework is designed to facilitate drawing robust scientific conclusions about model performance, and thus our approach focuses on exploration rather than conventional optimization.
Researcher Affiliation	Collaboration	1Computer Laboratory, University of Cambridge 2Department of Psychology, University of Cambridge 3Allen Institute for AI
Pseudocode	No	The paper describes steps for efficient multiverse exploration (1. Sample an initial design, 2. Fit a GP model, 3. Use an acquisition function, 4. Repeat steps), but this is presented as a numbered list within the text, not as a formally labeled pseudocode or algorithm block.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] To be included in supplementary materials.
Open Datasets	Yes	Our evaluation function is the test accuracy of the SVM on the Wisconsin Breast Cancer Dataset [49]...trained for 300 epochs on CIFAR-10...dataset ∈ {CIFAR-10, CIFAR-100 [53], Tiny Image Net[73]}.
Dataset Splits	No	The paper mentions training and testing on datasets like CIFAR-10 but does not specify the explicit percentages or sample counts for training, validation, or test splits. It refers to 'test accuracy' but does not detail how the data was partitioned into these specific splits.
Hardware Specification	Yes	Calculated using https://mlco2.github.io/impact [77] assuming A100 GPUs on the University of Cambridge HPC cluster with carbon efficiency 0.307 kg CO2/kWh.
Software Dependencies	No	For GP modeling we use GPy [44] with Emu Kit [45] for experimental design and sensitivity analysis. We use Torch Vision’s [46] off-the-shelf deep learning model architectures. The paper names software but does not specify version numbers for reproducibility.
Experiment Setup	Yes	We set our search space to learning rate ∈ [10^-4, 10^0] by ∈ [10^-11, 10^-4]...The model is VGG-16 with batch normalization [34] and dropout [56], trained for 300 epochs on CIFAR-10...The search space includes learning rate ∈ [10^-4, 10^-1/2], batch size ∈ {2^4, ..., 2^13}, model ∈ {Alex Net [71], VGG [55], Res Net [72]}, and dataset ∈ {CIFAR-10, CIFAR-100 [53], Tiny Image Net[73]}.