ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
Authors: Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/. We conduct a comprehensive evaluation of our method across 28 diverse continuous control tasks, spanning 7 domains. |
| Researcher Affiliation | Academia | 1Tsinghua University 2University of Maryland 3Beijing Technology and Businesses University 4Shanghai Qi Zhi Institute. Correspondence to: Tianying Ji <jity20@mails.tsinghua.edu.cn>, Yongyuan Liang <cheryll Liang@outlook.com>. |
| Pseudocode | Yes | The pseudocode of our proposed ACE is provided in Algorithm 1. |
| Open Source Code | No | Benchmark results and videos are available at https://ace-rl.github.io/. This statement does not explicitly confirm the availability of source code for the methodology. |
| Open Datasets | Yes | We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b). |
| Dataset Splits | Yes | We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b). These are standard benchmark suites with predefined splits. |
| Hardware Specification | Yes | Our experiments were conducted on a server equipped with an AMD EPYC 7763 64-Core Processor (256 threads) and four NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries or frameworks. |
| Experiment Setup | Yes | The hyperparameters used for training ACE are outlined in Table 2. We conduct all experiments with this single set of hyperparameters. Hyper-parameter Value: discounted factor γ 0.99, soft update factor τ 0.005, learning rate α 0.0003, batch size N 512, policy updates per step 1, value target updates interval 2, sample size for causality Nc 10000, causality computation interval I 10000, max reset factor αmax 0.8, reset interval 200000, dormancy threshold τ 0.025. |