Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making
Authors: Minmin Chen, Ramki Gummadi, Chris Harris, Dale Schuurmans
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate how well optimizing the surrogate (10) minimizes true risk, using a separate test set for evaluation. As baselines, we compare to directly minimizing empirical risk... |
| Researcher Affiliation | Collaboration | Minmin Chen Ramki Gummadi Chris Harris Dale Schuurmans Google University of Alberta |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Appendix and code available at https://www.cs.ualberta.ca/~dale/neurips19/supplement |
| Open Datasets | Yes | MNIST We first consider MNIST data, training a fully connected model with one hidden layer of 512 Re LU units. |
| Dataset Splits | Yes | The original training data was partitioned into the first 55K examples for training and the last 5K examples for validation. |
| Hardware Specification | No | The paper describes model architectures and datasets used but does not specify any hardware details like GPU models, CPU types, or cloud computing resources used for experiments. |
| Software Dependencies | No | We set any unspecified model hyperparameters to the defaults for resnet in the open source tensor2tensor library [39] and tuned learning rate and the composite loss combination weights on validation data. |
| Experiment Setup | Yes | We use the validation data to select hyperparameters, including learning rate, mini-batch size, and combination weights (details in appendix). The policy was trained by minimizing each objective using SGD with momentum fixed at 0.9 [33] for 100 epochs. |