reproducibilityindex.ai

Twice regularized MDPs and the equivalence between robustness and regularization

Authors: Esther Derman, Matthieu Geist, Shie Mannor

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Numerical Experiments We aim to compare the computing time of R2 MPI with that of MPI [30] and robust MPI [18]. The code is available at https://github.com/Esther Derman/r2mdp. To do so, we run experiments on an Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz machine, which we test on a 5 5 grid-world domain.
Researcher Affiliation	Collaboration	Esther Derman Technion Matthieu Geist Google Research, Brain Team Shie Mannor Technion, NVIDIA Research
Pseudocode	Yes	Algorithm 1: R2 MPI Result: πk+1, vk+1 Initialize vk RS; while not converged do πk+1 GΩR2 (vk); vk+1 (T πk+1,R2)mvk; end
Open Source Code	Yes	The code is available at https://github.com/Esther Derman/r2mdp.
Open Datasets	No	The paper describes using a '5x5 grid-world domain' for experiments, which is a custom simulation environment rather than a publicly available dataset with specific access details like a URL, DOI, or formal citation.
Dataset Splits	No	The paper describes a custom simulation environment ('5x5 grid-world domain') and evaluates algorithms within it. It does not mention using standard train/validation/test splits from a dataset or provide specific percentages or sample counts for such splits.
Hardware Specification	Yes	To do so, we run experiments on an Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz machine
Software Dependencies	No	The paper states that 'The code is available at https://github.com/Esther Derman/r2mdp' but does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Parameter values and other implementation details are deferred to Appx. D. We obtain the same value for R2 PE and robust PE, which numerically conﬁrms Thm. 4.1. For simplicity, we focus on an (s, a)-rectangular uncertainty set and take the same ball radius α (resp. β) at each state-action pair for the reward function (resp. transition function).