reproducibilityindex.ai

Bayesian Counterfactual Risk Minimization

Authors: Ben London, Ted Sandler

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We end with an empirical study of our theoretical results. First, we show that LPR outperforms standard L2 regularization whenever the logging policy is better than a uniform distribution. Second, we show that LPR is competitive with variance regularization, and even outperforms it on certain problems. Finally, we demonstrate that it is indeed possible to learn the logging policy for LPR with negligible impact on performance. These ﬁndings establish LPR as a simple, effective method for Bayesian CRM.
Researcher Affiliation	Industry	1Amazon, Seattle, WA, USA.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the described methodology.
Open Datasets	Yes	Fashion-MNIST (Xiao et al., 2017) and CIFAR-100 (Krizhevsky and Hinton, 2009).
Dataset Splits	Yes	Fashion-MNIST consists of 70,000 (60,000 training; 10,000 testing) grayscale images... we tune the regularization parameter, λ, using 5-fold cross-validation on each log dataset, with truncated IPS estimation of expected reward on the holdout set.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions 'Ada Grad (Duchi et al., 2011)' and general methods, but does not specify any software names with version numbers for reproducibility (e.g., Python version, PyTorch version).
Experiment Setup	Yes	We set the learning rate to 0.1 and the smoothing parameter to one... with minibatches of 100 examples. We set the learning rate to 0.1 and the smoothing parameter to one... we run training for 500 epochs, with random shufﬂing of the training data at each epoch. All model parameters are initialized to zero... In all experiments, we set τ 0.01.