Online Robust Reinforcement Learning with Model Uncertainty
Authors: Yue Wang, Shaofeng Zou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments further demonstrate the robustness of our algorithms. |
| Researcher Affiliation | Academia | Yue Wang University at Buffalo Buffalo, NY 14228 ywang294@buffalo.edu Shaofeng Zou University at Buffalo Buffalo, NY 14228 szou3@buffalo.edu |
| Pseudocode | Yes | Algorithm 1 Robust Q-Learning; Algorithm 2 Robust TDC with Linear Function Approximation |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that code is made available. |
| Open Datasets | Yes | We use Open AI gym framework [Brockman et al., 2016], and consider two different problems: Frozen lake and Cart-Pole. |
| Dataset Splits | No | The paper describes training on a 'perturbed MDP' and testing on an 'unperturbed MDP' but does not specify a separate validation split or its methodology. |
| Hardware Specification | No | The paper does not specify any hardware used for the experiments (e.g., CPU, GPU models). |
| Software Dependencies | No | The paper mentions 'Open AI gym framework' but does not provide version numbers for this or any other software components. |
| Experiment Setup | Yes | The behavior policy for all the experiments below is set to be a uniform distribution over the action space given any state, i.e., πb(a|s) = 1 |A| for any s S and a A. We take the average over 30 trajectories. We set α = 0.2 and γ = 0.9. |