Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Authors: Feng Gao, Liangzhi Shi, Shenao Zhang, Zhaoran Wang, Yi Wu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the theoretical side, we demonstrate AGPO s convergence, emphasizing its stable performance under non-smooth dynamics due to low variance. On the empirical side, our results show that AGPO effectively mitigates the challenges posed by non-smoothness in policy learning through differentiable simulation. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Northwestern University, Illinois, United States 3Shanghai Qi Zhi Institute, Shanghai, China. |
| Pseudocode | Yes | Algorithm 1 Adaptive-Gradient Policy Optimization |
| Open Source Code | No | The paper states "We implemented our algorithm using JAX (Bradbury et al., 2018) for empirical analysis." but does not provide an explicit statement about releasing their code or a link to a repository for the methodology described. |
| Open Datasets | Yes | We employed the canonical Ant task from Brax (Freeman et al., 2021). |
| Dataset Splits | No | The paper details episode lengths, number of runs, and parallel environments (e.g., "num envs = 64", "num eval envs = 128" in Table 1) for the simulation setup, but does not provide explicit training/test/validation dataset splits (e.g., percentages or sample counts) for a fixed dataset, which is common in supervised learning but less so in RL where interaction with an environment generates data. |
| Hardware Specification | Yes | We conducted our experiments on one NVIDIA Ge Force RTX 3090 GPU with 24 GB GDDR6X memory. |
| Software Dependencies | Yes | We implemented our codes on the JAX framework, supporting XLA and automatic differentiation. ... We utilize the implementations provided by Stable Baselines3 (Raffin et al., 2021) and add a custom wrapper for our simulation environments. ... By leveraging auto-differentiation tools like Py Torch (Paszke et al., 2019) and JAX (Bradbury et al., 2018) or specially crafted differentiable kernels (Xu et al., 2022)... |
| Experiment Setup | Yes | Table 1. Training hyper-parameters for AGPO. (Includes specific values for learning rate, batch size, hidden sizes, discount factor, etc.) |