Efficient Continuous Control with Double Actors and Regularized Critics
Authors: Jiafei Lyu, Xiaoteng Ma, Jiangpeng Yan, Xiu Li7655-7663
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on challenging continuous control benchmarks, Mu Jo Co and Py Bullet, show that DARC significantly outperforms current baselines with higher average return and better sample efficiency. We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018), where we compare our DARC algorithm against the current common baselines, including TD3 and Soft Actor-Critic (SAC) (Haarnoja et al. 2018a,b). |
| Researcher Affiliation | Academia | Jiafei Lyu1*, Xiaoteng Ma2 , Jiangpeng Yan2, Xiu Li1 1 Tsinghua Shenzhen International Graduate School, Tsinghua University 2 Department of Automation, Tsinghua University |
| Pseudocode | Yes | Algorithm 1: Double Actors Regularized Critics (DARC) |
| Open Source Code | No | The paper mentions open-sourced implementations for baselines (Fujimoto 2018; Tianhong 2019), but it does not state that the code for DARC or the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We perform extensive experiments on two challenging continuous control benchmarks, Mu Jo Co (Brockman et al. 2016) and Py Bullet (Ellenberger 2018) |
| Dataset Splits | No | The paper mentions running experiments for a certain number of timesteps and seeds ('Each algorithm is repeated with 5 independent seeds and evaluated for 10 times every 5000 timesteps'), but it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the way a supervised learning task would. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Open AI Gym, Mu Jo Co, Py Bullet, TD3, and SAC, but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions). |
| Experiment Setup | Yes | The regularization coefficient is set to be 0.005 by default and the value estimation weight ν is mainly selected from [0, 0.5] with 0.05 as interval by using grid search. We use the same hyperparameters in DARC as the default setting for TD3 on all tasks except Humanoid-v2 where all these methods fail with default hyperparameters. Details for hyperparameters are listed in Appendix E. |