Bayesian Counterfactual Risk Minimization

Authors: Ben London, Ted Sandler

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We end with an empirical study of our theoretical results. First, we show that LPR outperforms standard L2 regularization whenever the logging policy is better than a uniform distribution. Second, we show that LPR is competitive with variance regularization, and even outperforms it on certain problems. Finally, we demonstrate that it is indeed possible to learn the logging policy for LPR with negligible impact on performance. These findings establish LPR as a simple, effective method for Bayesian CRM.
Researcher Affiliation Industry 1Amazon, Seattle, WA, USA.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the described methodology.
Open Datasets Yes Fashion-MNIST (Xiao et al., 2017) and CIFAR-100 (Krizhevsky and Hinton, 2009).
Dataset Splits Yes Fashion-MNIST consists of 70,000 (60,000 training; 10,000 testing) grayscale images... we tune the regularization parameter, λ, using 5-fold cross-validation on each log dataset, with truncated IPS estimation of expected reward on the holdout set.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions 'Ada Grad (Duchi et al., 2011)' and general methods, but does not specify any software names with version numbers for reproducibility (e.g., Python version, PyTorch version).
Experiment Setup Yes We set the learning rate to 0.1 and the smoothing parameter to one... with minibatches of 100 examples. We set the learning rate to 0.1 and the smoothing parameter to one... we run training for 500 epochs, with random shuffling of the training data at each epoch. All model parameters are initialized to zero... In all experiments, we set τ 0.01.