Twice regularized MDPs and the equivalence between robustness and regularization
Authors: Esther Derman, Matthieu Geist, Shie Mannor
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Numerical Experiments We aim to compare the computing time of R2 MPI with that of MPI [30] and robust MPI [18]. The code is available at https://github.com/Esther Derman/r2mdp. To do so, we run experiments on an Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz machine, which we test on a 5 5 grid-world domain. |
| Researcher Affiliation | Collaboration | Esther Derman Technion Matthieu Geist Google Research, Brain Team Shie Mannor Technion, NVIDIA Research |
| Pseudocode | Yes | Algorithm 1: R2 MPI Result: πk+1, vk+1 Initialize vk RS; while not converged do πk+1 GΩR2 (vk); vk+1 (T πk+1,R2)mvk; end |
| Open Source Code | Yes | The code is available at https://github.com/Esther Derman/r2mdp. |
| Open Datasets | No | The paper describes using a '5x5 grid-world domain' for experiments, which is a custom simulation environment rather than a publicly available dataset with specific access details like a URL, DOI, or formal citation. |
| Dataset Splits | No | The paper describes a custom simulation environment ('5x5 grid-world domain') and evaluates algorithms within it. It does not mention using standard train/validation/test splits from a dataset or provide specific percentages or sample counts for such splits. |
| Hardware Specification | Yes | To do so, we run experiments on an Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz machine |
| Software Dependencies | No | The paper states that 'The code is available at https://github.com/Esther Derman/r2mdp' but does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Parameter values and other implementation details are deferred to Appx. D. We obtain the same value for R2 PE and robust PE, which numerically confirms Thm. 4.1. For simplicity, we focus on an (s, a)-rectangular uncertainty set and take the same ball radius α (resp. β) at each state-action pair for the reward function (resp. transition function). |