Efficient Continuous Control with Double Actors and Regularized Critics

Authors: Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li7655-7663

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on challenging continuous control benchmarks, Mu Jo Co and Py Bullet, show that DARC significantly outperforms current baselines with higher average return and better sample efficiency. We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018), where we compare our DARC algorithm against the current common baselines, including TD3 and Soft Actor-Critic (SAC) (Haarnoja et al. 2018a,b).
Researcher Affiliation Academia Jiafei Lyu1*, Xiaoteng Ma2 , Jiangpeng Yan2, Xiu Li1 1 Tsinghua Shenzhen International Graduate School, Tsinghua University 2 Department of Automation, Tsinghua University
Pseudocode Yes Algorithm 1: Double Actors Regularized Critics (DARC)
Open Source Code No The paper mentions open-sourced implementations for baselines (Fujimoto 2018; Tianhong 2019), but it does not state that the code for DARC or the methodology described in this paper is publicly available.
Open Datasets Yes We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018)
Dataset Splits No The paper mentions running experiments for a certain number of timesteps and seeds ('Each algorithm is repeated with 5 independent seeds and evaluated for 10 times every 5000 timesteps'), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the way a supervised learning task would.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using Open AI Gym, Mu Jo Co, Py Bullet, TD3, and SAC, but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions).
Experiment Setup Yes The regularization coefficient is set to be 0.005 by default and the value estimation weight ν is mainly selected from [0, 0.5] with 0.05 as interval by using grid search. We use the same hyperparameters in DARC as the default setting for TD3 on all tasks except Humanoid-v2 where all these methods fail with default hyperparameters. Details for hyperparameters are listed in Appendix E.