reproducibilityindex.ai

Offline Contextual Bandits with Overparameterized Models

Authors: David Brandfonbrener, William Whitney, Rajesh Ranganath, Joan Bruna

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate the gap in both action stability and bandit error between policy-based and value-based algorithms when using large neural network models on synthetic and image-based datasets.
Researcher Affiliation	Academia	1Courant Institute of Mathematical Sciences, New York University, New York, New York, USA.
Pseudocode	No	The paper describes algorithms and mathematical formulations (e.g., Equations 1-6) but does not include structured pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Code can be found at https://github.com/ davidbrandfonbrener/deep-offline-bandits.
Open Datasets	Yes	We will use the a bandit version of CIFAR-10 (Krizhevsky, 2009). To turn CIFAR into an ofﬂine bandit problem we view each possible label as an action and assign reward of 1 for a correct label/action and 0 for an incorrect label/action.
Dataset Splits	No	For these experiments we set K = 2, d = 10, ϵ = 0.1. We take N = 100 training points and sample an independent test set of 500 points. (The paper specifies training and test sets but does not mention a separate validation set or cross-validation setup.)
Hardware Specification	No	The paper mentions using 'MLPs' and 'Resnet-18' models, but it does not specify any hardware details such as particular GPU or CPU models, or memory configurations used for running the experiments.
Software Dependencies	No	We train Resnet-18 (He et al., 2016) models using Pytorch (Paszke et al., 2019). (The paper mentions Pytorch but does not specify a version number or other software dependencies with their versions.)
Experiment Setup	Yes	For these experiments we set K = 2, d = 10, ϵ = 0.1. We take N = 100 training points and sample an independent test set of 500 points. As our models we use MLPs with one hidden layer of width 512. (...) Full details about the training procedure along with learning curves and further results are in Appendix E.