reproducibilityindex.ai

The Neural Testbed: Evaluating Joint Predictions

Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, MORTEZA IBRAHIMI, Dieterich Lawson, Botao Hao, Brendan O'Donoghue, Benjamin Van Roy

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate a range of agents using a simple neural network data generating process. Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions.
Researcher Affiliation	Industry	Deep Mind, Efficient Agent Team, Mountain View
Pseudocode	Yes	Figure 3: Algorithm 1 KL-Loss Estimation
Open Source Code	Yes	Together with this conceptual contribution, we open-source code in Appendix A. This consists of highly optimized evaluation code, reference agent implementations and automated reproducible analysis.
Open Datasets	No	The Neural Testbed works by generating random classification problems using a neuralnetwork-based generative process. The paper emphasizes using a generative model to produce unlimited data, rather than relying on a fixed, publicly available dataset with concrete access information or a formal citation.
Dataset Splits	No	The paper states: 'The testbed splits data into a training set and testing set, allows a deep learning agent to train on the training set, and then evaluates the quality of the predictions on the testing set.' It does not explicitly mention a separate validation set or describe validation splits.
Hardware Specification	No	The paper states: 'Our experiments make extensive use of parallel computation to facilitate hyperparameter sweeps. Nevertheless, the overall computational cost is relatively low by modern deep learning standards and relies only on standard CPUs.' It does not provide specific models or detailed specifications for the hardware used.
Software Dependencies	No	The paper mentions 'The testbed uses JAX internally (Bradbury et al., 2018), but can be used to evaluate any python agent.' However, it does not specify version numbers for JAX or other key software components used in their experiments.
Experiment Setup	Yes	Table 1 lists agents that we study and compare as well as hyperparameters that we tune. In our experiments, we optimize these hyperparameters via grid search.