Policy Gradient for Rectangular Robust Markov Decision Processes
Authors: Navdeep Kumar, Esther Derman, Matthieu Geist, Kfir Y. Levy, Shie Mannor
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our RPG speeds up state-of-the-art robust PG updates by 2 orders of magnitude. and In the following experiments, we randomly generate nominal models for arbitrary state-action space sizes. Each experiment was averaged over 100 runs. |
| Researcher Affiliation | Collaboration | Navdeep Kumar Technion Esther Derman MILA, Université de Montréal Matthieu Geist Goodle Deepmind Kfir Levy Technion Shie Mannor Technion, NVIDIA Research |
| Pseudocode | Yes | Algorithm 1 RPG |
| Open Source Code | Yes | All codes and results are available at https://github.com/navdtech/rpg. |
| Open Datasets | No | In the following experiments, we randomly generate nominal models for arbitrary state-action space sizes. |
| Dataset Splits | No | The paper does not provide specific dataset split information (train/validation/test) as it uses randomly generated models rather than a pre-existing dataset with defined splits. |
| Hardware Specification | Yes | Hardware Experiments are done on the machine with the following configuration: Intel(R) Core(TM) i7-6700 CPU @3.40GHZ, size:3598MHz, capacity 4GHz, width 64 bits, memory size 64 Gi B. |
| Software Dependencies | No | All the experiments were done in Python using numpy, matplotlib. |
| Experiment Setup | Yes | Discount factor γ = 0.9, reward noise radius αs,a, αs = 0.1, transition noise kernel βs,a, βs = 0.01 |