Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Robust Reinforcement Learning with Model Uncertainty

Authors: Yue Wang, Shaofeng Zou

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments further demonstrate the robustness of our algorithms.
Researcher Affiliation Academia Yue Wang University at Buffalo Buffalo, NY 14228 EMAIL Shaofeng Zou University at Buffalo Buffalo, NY 14228 EMAIL
Pseudocode Yes Algorithm 1 Robust Q-Learning; Algorithm 2 Robust TDC with Linear Function Approximation
Open Source Code No The paper does not provide any links to open-source code or explicitly state that code is made available.
Open Datasets Yes We use Open AI gym framework [Brockman et al., 2016], and consider two different problems: Frozen lake and Cart-Pole.
Dataset Splits No The paper describes training on a 'perturbed MDP' and testing on an 'unperturbed MDP' but does not specify a separate validation split or its methodology.
Hardware Specification No The paper does not specify any hardware used for the experiments (e.g., CPU, GPU models).
Software Dependencies No The paper mentions 'Open AI gym framework' but does not provide version numbers for this or any other software components.
Experiment Setup Yes The behavior policy for all the experiments below is set to be a uniform distribution over the action space given any state, i.e., πb(a|s) = 1 |A| for any s S and a A. We take the average over 30 trajectories. We set α = 0.2 and γ = 0.9.