Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
Authors: Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/. We conduct a comprehensive evaluation of our method across 28 diverse continuous control tasks, spanning 7 domains. |
| Researcher Affiliation | Academia | 1Tsinghua University 2University of Maryland 3Beijing Technology and Businesses University 4Shanghai Qi Zhi Institute. Correspondence to: Tianying Ji <EMAIL>, Yongyuan Liang <cheryll EMAIL>. |
| Pseudocode | Yes | The pseudocode of our proposed ACE is provided in Algorithm 1. |
| Open Source Code | No | Benchmark results and videos are available at https://ace-rl.github.io/. This statement does not explicitly confirm the availability of source code for the methodology. |
| Open Datasets | Yes | We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b). |
| Dataset Splits | Yes | We evaluate ACE across 29 diverse continuous control tasks spanning 7 task domains using a single set of hyperparameters: Mu Jo Co (Todorov et al., 2012a), Meta World (Yu et al., 2019a), Deepmind Control Suite (Tassa et al., 2018a), Adroit (Rajeswaran et al., 2018), Shadow Dexterous Hand (Plappert et al., 2018), Panda-gym (Gallouédec et al., 2021b), and ROBEL (Ahn et al., 2020b). These are standard benchmark suites with predefined splits. |
| Hardware Specification | Yes | Our experiments were conducted on a server equipped with an AMD EPYC 7763 64-Core Processor (256 threads) and four NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries or frameworks. |
| Experiment Setup | Yes | The hyperparameters used for training ACE are outlined in Table 2. We conduct all experiments with this single set of hyperparameters. Hyper-parameter Value: discounted factor γ 0.99, soft update factor τ 0.005, learning rate α 0.0003, batch size N 512, policy updates per step 1, value target updates interval 2, sample size for causality Nc 10000, causality computation interval I 10000, max reset factor αmax 0.8, reset interval 200000, dormancy threshold τ 0.025. |