reproducibilityindex.ai

Surrogate Objectives for Batch Policy Optimization in One-step Decision Making

Authors: Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate how well optimizing the surrogate (10) minimizes true risk, using a separate test set for evaluation. As baselines, we compare to directly minimizing empirical risk...
Researcher Affiliation	Collaboration	Minmin Chen Ramki Gummadi Chris Harris Dale Schuurmans Google University of Alberta
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Appendix and code available at https://www.cs.ualberta.ca/~dale/neurips19/supplement
Open Datasets	Yes	MNIST We ﬁrst consider MNIST data, training a fully connected model with one hidden layer of 512 Re LU units.
Dataset Splits	Yes	The original training data was partitioned into the ﬁrst 55K examples for training and the last 5K examples for validation.
Hardware Specification	No	The paper describes model architectures and datasets used but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for experiments.
Software Dependencies	No	We set any unspeciﬁed model hyperparameters to the defaults for resnet in the open source tensor2tensor library [39] and tuned learning rate and the composite loss combination weights on validation data.
Experiment Setup	Yes	We use the validation data to select hyperparameters, including learning rate, mini-batch size, and combination weights (details in appendix). The policy was trained by minimizing each objective using SGD with momentum ﬁxed at 0.9 [33] for 100 epochs.