ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

Authors: Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/. We conduct a comprehensive evaluation of our method across 28 diverse continuous control tasks, spanning 7 domains.
Researcher Affiliation Academia 1Tsinghua University 2University of Maryland 3Beijing Technology and Businesses University 4Shanghai Qi Zhi Institute. Correspondence to: Tianying Ji <jity20@mails.tsinghua.edu.cn>, Yongyuan Liang <cheryll Liang@outlook.com>.
Pseudocode Yes The pseudocode of our proposed ACE is provided in Algorithm 1.
Open Source Code No Benchmark results and videos are available at https://ace-rl.github.io/. This statement does not explicitly confirm the availability of source code for the methodology.
Open Datasets Yes We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b).
Dataset Splits Yes We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b). These are standard benchmark suites with predefined splits.
Hardware Specification Yes Our experiments were conducted on a server equipped with an AMD EPYC 7763 64-Core Processor (256 threads) and four NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as libraries or frameworks.
Experiment Setup Yes The hyperparameters used for training ACE are outlined in Table 2. We conduct all experiments with this single set of hyperparameters. Hyper-parameter Value: discounted factor γ 0.99, soft update factor τ 0.005, learning rate α 0.0003, batch size N 512, policy updates per step 1, value target updates interval 2, sample size for causality Nc 10000, causality computation interval I 10000, max reset factor αmax 0.8, reset interval 200000, dormancy threshold τ 0.025.