reproducibilityindex.ai

Policy Gradient for Rectangular Robust Markov Decision Processes

Authors: Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Y. Levy, Shie Mannor

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our RPG speeds up state-of-the-art robust PG updates by 2 orders of magnitude. and In the following experiments, we randomly generate nominal models for arbitrary state-action space sizes. Each experiment was averaged over 100 runs.
Researcher Affiliation	Collaboration	Navdeep Kumar Technion Esther Derman MILA, Université de Montréal Matthieu Geist Goodle Deepmind Kfir Levy Technion Shie Mannor Technion, NVIDIA Research
Pseudocode	Yes	Algorithm 1 RPG
Open Source Code	Yes	All codes and results are available at https://github.com/navdtech/rpg.
Open Datasets	No	In the following experiments, we randomly generate nominal models for arbitrary state-action space sizes.
Dataset Splits	No	The paper does not provide specific dataset split information (train/validation/test) as it uses randomly generated models rather than a pre-existing dataset with defined splits.
Hardware Specification	Yes	Hardware Experiments are done on the machine with the following configuration: Intel(R) Core(TM) i7-6700 CPU @3.40GHZ, size:3598MHz, capacity 4GHz, width 64 bits, memory size 64 Gi B.
Software Dependencies	No	All the experiments were done in Python using numpy, matplotlib.
Experiment Setup	Yes	Discount factor γ = 0.9, reward noise radius αs,a, αs = 0.1, transition noise kernel βs,a, βs = 0.01