Bayesian Counterfactual Risk Minimization
Authors: Ben London, Ted Sandler
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We end with an empirical study of our theoretical results. First, we show that LPR outperforms standard L2 regularization whenever the logging policy is better than a uniform distribution. Second, we show that LPR is competitive with variance regularization, and even outperforms it on certain problems. Finally, we demonstrate that it is indeed possible to learn the logging policy for LPR with negligible impact on performance. These findings establish LPR as a simple, effective method for Bayesian CRM. |
| Researcher Affiliation | Industry | 1Amazon, Seattle, WA, USA. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. |
| Open Datasets | Yes | Fashion-MNIST (Xiao et al., 2017) and CIFAR-100 (Krizhevsky and Hinton, 2009). |
| Dataset Splits | Yes | Fashion-MNIST consists of 70,000 (60,000 training; 10,000 testing) grayscale images... we tune the regularization parameter, λ, using 5-fold cross-validation on each log dataset, with truncated IPS estimation of expected reward on the holdout set. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Ada Grad (Duchi et al., 2011)' and general methods, but does not specify any software names with version numbers for reproducibility (e.g., Python version, PyTorch version). |
| Experiment Setup | Yes | We set the learning rate to 0.1 and the smoothing parameter to one... with minibatches of 100 examples. We set the learning rate to 0.1 and the smoothing parameter to one... we run training for 500 epochs, with random shuffling of the training data at each epoch. All model parameters are initialized to zero... In all experiments, we set τ 0.01. |