reproducibilityindex.ai

Safe Policy Improvement by Minimizing Robust Baseline Regret

Authors: Mohammad Ghavamzadeh, Marek Petrik, Yinlam Chow

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results on several domains further show that even the simple approximate algorithm can outperform standard approaches. In this section, we experimentally evaluate the beneﬁts of minimizing the robust baseline regret.
Researcher Affiliation	Collaboration	Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Mohammad Ghavamzadeh Adobe Research & INRIA Lille ghavamza@adobe.com Yinlam Chow Stanford University ychow@stanford.edu
Pseudocode	Yes	Algorithm 1: Approximate Robust Baseline Regret Minimization Algorithm
Open Source Code	No	No explicit statement about providing open-source code or a link to a code repository.
Open Datasets	No	We use a uniform random policy to gather samples. The problem is based on the domain from [Petrik and Wu, 2015], whose description is detailed in Appendix I.2.
Dataset Splits	No	No specific training, validation, or test dataset splits are described in terms of percentages or sample counts. The paper mentions 'mean of 40 runs' and 'averaged over 5 runs' for experiments, but not data splitting.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts) are mentioned for running experiments.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned.
Experiment Setup	No	The paper describes how the model error function is constructed from samples and details of the problem domains (Grid Problem, Energy Arbitrage), including aspects like "uniform random policy to gather samples" and "number of transition samples used in constructing the uncertain model." However, it does not provide specific algorithmic hyperparameters (e.g., learning rate, batch size, number of epochs) or other detailed training configurations for the proposed algorithms.