Policy Gradient Method For Robust Reinforcement Learning
Authors: Yue Wang, Shaofeng Zou
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide simulation results to demonstrate the robustness of our methods. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering, University at Buffalo, New York, USA. Correspondence to: Shaofeng Zou <szou3@buffalo.edu>. |
| Pseudocode | Yes | Algorithm 1 Robust Policy Gradient |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the authors' source code for the described methodology is publicly available. |
| Open Datasets | Yes | We test our algorithms on the Garnet problem (Archibald et al., 1995) and the Taxi environment from Open AI (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes testing algorithms on environments (Garnet, Taxi) but does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets) for data partitioning. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as exact GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions general software concepts like 'neural network parameterized policy' but does not provide specific software dependency details with version numbers (e.g., library names with specific versions). |
| Experiment Setup | Yes | In this section, we consider Garnet problem G(30, 20) using neural network parameterized policy, where we use a two-layer neural network with 15 neurons in the hidden layer to parameterize the policy πθ. We then use a two-layer neural network (with 20 neurons in the hidden layer) in the critic. |