Policy Gradient Method For Robust Reinforcement Learning

Authors: Yue Wang, Shaofeng Zou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide simulation results to demonstrate the robustness of our methods.
Researcher Affiliation Academia 1Department of Electrical Engineering, University at Buffalo, New York, USA. Correspondence to: Shaofeng Zou <szou3@buffalo.edu>.
Pseudocode Yes Algorithm 1 Robust Policy Gradient
Open Source Code No The paper does not contain any explicit statements or links indicating that the authors' source code for the described methodology is publicly available.
Open Datasets Yes We test our algorithms on the Garnet problem (Archibald et al., 1995) and the Taxi environment from Open AI (Brockman et al., 2016).
Dataset Splits No The paper describes testing algorithms on environments (Garnet, Taxi) but does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets) for data partitioning.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments, such as exact GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions general software concepts like 'neural network parameterized policy' but does not provide specific software dependency details with version numbers (e.g., library names with specific versions).
Experiment Setup Yes In this section, we consider Garnet problem G(30, 20) using neural network parameterized policy, where we use a two-layer neural network with 15 neurons in the hidden layer to parameterize the policy πθ. We then use a two-layer neural network (with 20 neurons in the hidden layer) in the critic.