Robust Policy Gradient against Strong Data Corruption
Authors: Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Complimentary to the theoretical results, we show that a neural implementation of FPG achieves strong robust learning performance on the Mu Jo Co continuous control benchmarks. ... The experiment results are shown in Figure 1. |
| Researcher Affiliation | Academia | 1Department of Computer Sciences, University of Wisconsin Madison 2Cornell University. |
| Pseudocode | Yes | Algorithm 1 dπ ν sampler and Qπ estimator; Algorithm 2 Natural Policy Gradient (NPG); Algorithm 3 Robust Linear Regression via SEVER |
| Open Source Code | No | The paper does not provide a statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | we show that a neural implementation of FPG achieves strong robust learning performance on the Mu Jo Co continuous control benchmarks (Todorov et al., 2012) |
| Dataset Splits | No | The paper mentions using MuJoCo benchmarks but does not specify details regarding training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions software like TRPO, PyTorch, and SEVER but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Throughout the experiment, we set the contamination level ε = 0.01, and tune δ among the values of [1, 2, 4, 8, 16, 32, 64]... All experiments are repeated with 3 random seeds and the mean and standard deviations are plotted in the figures. ... The pseudo-code and implementation details are discussed in appendix G. |