reproducibilityindex.ai

Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

Authors: Feng Gao, Liangzhi Shi, Shenao Zhang, Zhaoran Wang, Yi Wu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On the theoretical side, we demonstrate AGPO s convergence, emphasizing its stable performance under non-smooth dynamics due to low variance. On the empirical side, our results show that AGPO effectively mitigates the challenges posed by non-smoothness in policy learning through differentiable simulation.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Northwestern University, Illinois, United States 3Shanghai Qi Zhi Institute, Shanghai, China.
Pseudocode	Yes	Algorithm 1 Adaptive-Gradient Policy Optimization
Open Source Code	No	The paper states "We implemented our algorithm using JAX (Bradbury et al., 2018) for empirical analysis." but does not provide an explicit statement about releasing their code or a link to a repository for the methodology described.
Open Datasets	Yes	We employed the canonical Ant task from Brax (Freeman et al., 2021).
Dataset Splits	No	The paper details episode lengths, number of runs, and parallel environments (e.g., "num envs = 64", "num eval envs = 128" in Table 1) for the simulation setup, but does not provide explicit training/test/validation dataset splits (e.g., percentages or sample counts) for a fixed dataset, which is common in supervised learning but less so in RL where interaction with an environment generates data.
Hardware Specification	Yes	We conducted our experiments on one NVIDIA Ge Force RTX 3090 GPU with 24 GB GDDR6X memory.
Software Dependencies	Yes	We implemented our codes on the JAX framework, supporting XLA and automatic differentiation. ... We utilize the implementations provided by Stable Baselines3 (Raffin et al., 2021) and add a custom wrapper for our simulation environments. ... By leveraging auto-differentiation tools like Py Torch (Paszke et al., 2019) and JAX (Bradbury et al., 2018) or specially crafted differentiable kernels (Xu et al., 2022)...
Experiment Setup	Yes	Table 1. Training hyper-parameters for AGPO. (Includes specific values for learning rate, batch size, hidden sizes, discount factor, etc.)