Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
Authors: Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate how well optimizing the surrogate (10) minimizes true risk, using a separate test set for evaluation. As baselines, we compare to directly minimizing empirical risk... |
| Researcher Affiliation | Collaboration | Minmin Chen Ramki Gummadi Chris Harris Dale Schuurmans Google University of Alberta |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Appendix and code available at https://www.cs.ualberta.ca/~dale/neurips19/supplement |
| Open Datasets | Yes | MNIST We first consider MNIST data, training a fully connected model with one hidden layer of 512 Re LU units. |
| Dataset Splits | Yes | The original training data was partitioned into the first 55K examples for training and the last 5K examples for validation. |
| Hardware Specification | No | The paper describes model architectures and datasets used but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for experiments. |
| Software Dependencies | No | We set any unspecified model hyperparameters to the defaults for resnet in the open source tensor2tensor library [39] and tuned learning rate and the composite loss combination weights on validation data. |
| Experiment Setup | Yes | We use the validation data to select hyperparameters, including learning rate, mini-batch size, and combination weights (details in appendix). The policy was trained by minimizing each objective using SGD with momentum fixed at 0.9 [33] for 100 epochs. |