Online Robust Reinforcement Learning with Model Uncertainty

Authors: Yue Wang, Shaofeng Zou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments further demonstrate the robustness of our algorithms.
Researcher Affiliation Academia Yue Wang University at Buffalo Buffalo, NY 14228 ywang294@buffalo.edu Shaofeng Zou University at Buffalo Buffalo, NY 14228 szou3@buffalo.edu
Pseudocode Yes Algorithm 1 Robust Q-Learning; Algorithm 2 Robust TDC with Linear Function Approximation
Open Source Code No The paper does not provide any links to open-source code or explicitly state that code is made available.
Open Datasets Yes We use Open AI gym framework [Brockman et al., 2016], and consider two different problems: Frozen lake and Cart-Pole.
Dataset Splits No The paper describes training on a 'perturbed MDP' and testing on an 'unperturbed MDP' but does not specify a separate validation split or its methodology.
Hardware Specification No The paper does not specify any hardware used for the experiments (e.g., CPU, GPU models).
Software Dependencies No The paper mentions 'Open AI gym framework' but does not provide version numbers for this or any other software components.
Experiment Setup Yes The behavior policy for all the experiments below is set to be a uniform distribution over the action space given any state, i.e., πb(a|s) = 1 |A| for any s S and a A. We take the average over 30 trajectories. We set α = 0.2 and γ = 0.9.