Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

Authors: Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate how well optimizing the surrogate (10) minimizes true risk, using a separate test set for evaluation. As baselines, we compare to directly minimizing empirical risk...
Researcher Affiliation Collaboration Minmin Chen Ramki Gummadi Chris Harris Dale Schuurmans Google University of Alberta
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Appendix and code available at https://www.cs.ualberta.ca/~dale/neurips19/supplement
Open Datasets Yes MNIST We first consider MNIST data, training a fully connected model with one hidden layer of 512 Re LU units.
Dataset Splits Yes The original training data was partitioned into the first 55K examples for training and the last 5K examples for validation.
Hardware Specification No The paper describes model architectures and datasets used but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for experiments.
Software Dependencies No We set any unspecified model hyperparameters to the defaults for resnet in the open source tensor2tensor library [39] and tuned learning rate and the composite loss combination weights on validation data.
Experiment Setup Yes We use the validation data to select hyperparameters, including learning rate, mini-batch size, and combination weights (details in appendix). The policy was trained by minimizing each objective using SGD with momentum fixed at 0.9 [33] for 100 epochs.