Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Safe Policy Improvement by Minimizing Robust Baseline Regret
Authors: Mohammad Ghavamzadeh, Marek Petrik, Yinlam Chow
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on several domains further show that even the simple approximate algorithm can outperform standard approaches. In this section, we experimentally evaluate the bene๏ฌts of minimizing the robust baseline regret. |
| Researcher Affiliation | Collaboration | Marek Petrik University of New Hampshire EMAIL Mohammad Ghavamzadeh Adobe Research & INRIA Lille EMAIL Yinlam Chow Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1: Approximate Robust Baseline Regret Minimization Algorithm |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository. |
| Open Datasets | No | We use a uniform random policy to gather samples. The problem is based on the domain from [Petrik and Wu, 2015], whose description is detailed in Appendix I.2. |
| Dataset Splits | No | No specific training, validation, or test dataset splits are described in terms of percentages or sample counts. The paper mentions 'mean of 40 runs' and 'averaged over 5 runs' for experiments, but not data splitting. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts) are mentioned for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper describes how the model error function is constructed from samples and details of the problem domains (Grid Problem, Energy Arbitrage), including aspects like "uniform random policy to gather samples" and "number of transition samples used in constructing the uncertain model." However, it does not provide specific algorithmic hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations for the proposed algorithms. |