reproducibilityindex.ai

Efficient Continuous Control with Double Actors and Regularized Critics

Authors: Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li7655-7663

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on challenging continuous control benchmarks, Mu Jo Co and Py Bullet, show that DARC signiﬁcantly outperforms current baselines with higher average return and better sample efﬁciency. We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018), where we compare our DARC algorithm against the current common baselines, including TD3 and Soft Actor-Critic (SAC) (Haarnoja et al. 2018a,b).
Researcher Affiliation	Academia	Jiafei Lyu1*, Xiaoteng Ma2 , Jiangpeng Yan2, Xiu Li1 1 Tsinghua Shenzhen International Graduate School, Tsinghua University 2 Department of Automation, Tsinghua University
Pseudocode	Yes	Algorithm 1: Double Actors Regularized Critics (DARC)
Open Source Code	No	The paper mentions open-sourced implementations for baselines (Fujimoto 2018; Tianhong 2019), but it does not state that the code for DARC or the methodology described in this paper is publicly available.
Open Datasets	Yes	We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018)
Dataset Splits	No	The paper mentions running experiments for a certain number of timesteps and seeds ('Each algorithm is repeated with 5 independent seeds and evaluated for 10 times every 5000 timesteps'), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the way a supervised learning task would.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using Open AI Gym, Mu Jo Co, Py Bullet, TD3, and SAC, but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions).
Experiment Setup	Yes	The regularization coefﬁcient is set to be 0.005 by default and the value estimation weight ν is mainly selected from [0, 0.5] with 0.05 as interval by using grid search. We use the same hyperparameters in DARC as the default setting for TD3 on all tasks except Humanoid-v2 where all these methods fail with default hyperparameters. Details for hyperparameters are listed in Appendix E.