Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula
Authors: Aryaman Reddi, Maximilian Tölle, Jan Peters, Georgia Chalvatzaki, Carlo D'Eramo
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide extensive evidence of QARL outperforming RARL and recent baselines across several Mu Jo Co locomotion and navigation problems in overall performance and robustness. ... We demonstrate that our approach facilitates the learning of the protagonist, and we show that QARL outperforms RARL and related baselines in terms of performance and robustness across several high-dimensional Mu Jo Co problems, such as navigation and locomotion, of the Deep Mind Control Suite (Todorov et al., 2012; Tunyasuvunakool et al., 2020). |
| Researcher Affiliation | Academia | 1Department of Computer Science, TU Darmstadt, Germany 2Hessian Center for Artificial Intelligence (Hessian.ai), Germany 3German Research Center for AI (DFKI), Systems AI for Robot Learning, Germany 4Center for Cognitive Science, TU Darmstadt, Germany 5Center for Artificial Intelligence and Data Science, University of W urzburg, Germany |
| Pseudocode | Yes | Algorithm 1 Quantal Adversarial Reinforcement Learning (QARL)... Algorithm 2 Force-based curriculum adversarial reinforcement learning (Force-Curriculum) |
| Open Source Code | Yes | Code available at https://github.com/Aryaman Reddi99/quantal-adversarial-rl |
| Open Datasets | Yes | We consider a broad set of Mu Jo Co control problems (Todorov et al., 2012) from the Deep Mind Control Suite (Tunyasuvunakool et al., 2020) |
| Dataset Splits | No | The paper describes environments and experimental settings but does not provide specific numerical train/validation/test splits for any dataset. It mentions "training steps" and "test time" but not how the overall dataset is split. |
| Hardware Specification | Yes | The experiments are carried out on a computational cluster with 64GB of RAM and an AMD Ryzen 9 16-Core processor. |
| Software Dependencies | No | The algorithms investigated are implemented using the Mushroom RL (D Eramo et al., 2021) library, which is also used for the implementation of all agents and adversarial environment wrappers. The specific version number for Mushroom RL is not provided. |
| Experiment Setup | Yes | The hyperparameters of the adversarial agents across all environments are shown in Table 2. ... These environment-specific parameters are shown in Table 3... The algorithm parameters for Curriculum Adversarial Training (CAT) in Table 4 denote the iterations between which the gradient of the linear force budget curriculum would be constant and non-zero... |