Revisiting Domain Randomization via Relaxed State-Adversarial Policy Optimization
Authors: Yun-Hsuan Lien, Ping-Chun Hsieh, Yu-Shuen Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method by comparing it to state-of-the-art methods, providing experimental results and theoretical proofs to verify its effectiveness.Our source code and appendix are available at https://github.com/sophialien/RAPPO.We performed two experiments on the Mu Jo Co platform (Todorov et al., 2012) to assess the performance of our relaxed state adversarial policy optimization (RAPPO) against various adversaries. |
| Researcher Affiliation | Academia | Yun-Hsuan Lien 1 Ping-Chun Hsieh 1 Yu-Shuen Wang 1 1National Yang Ming Chiao Tung University, Hsinchu, Taiwan. Correspondence to: Yun-Hsuan Lien <sophia.yh.lien@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 outlines the steps of our approach. |
| Open Source Code | Yes | Our source code and appendix are available at https://github.com/sophialien/RAPPO. |
| Open Datasets | Yes | We performed two experiments on the Mu Jo Co platform (Todorov et al., 2012) to assess the performance of our relaxed state adversarial policy optimization (RAPPO) against various adversaries. |
| Dataset Splits | No | The paper mentions |
| Hardware Specification | No | No specific hardware (GPU model, CPU model, memory) is mentioned for running the experiments. |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | We set to 0.015, 0.002, 0.002, 0.03, and 0.005 for the Half Cheetah-v2, Hopper-v2, Ant-v2, Walker2d-v2, and Humanoid-v2 environments, respectively. These values were chosen based on the mean magnitude of actions taken in each environment.The baselines and our method were implemented using the PPO algorithm (Schulman et al., 2017), and the default parameters were used. |