TAAC: Temporally Abstract Actor-Critic for Continuous Control
Authors: Haonan Yu, Wei Xu, Haichao Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate TAAC s advantages over several strong baselines across 14 continuous control tasks. |
| Researcher Affiliation | Industry | Haonan Yu, Wei Xu, Haichao Zhang Horizon Robotics Cupertino, CA 95014 {haonan.yu,wei.xu,haichao.zhang}@horizon.ai |
| Pseudocode | Yes | The overall TAAC algorithm is summarized in Algorithm 1 Appendix A. |
| Open Source Code | Yes | Code is available at https://github.com/hnyu/taac. |
| Open Datasets | Yes | a) Simple Control: Three control tasks (Brockman et al., 2016) with small action and observation spaces: Mountain Car Continuous, Lunar Lander Continuous, and Inverted Double Pendulum ; b) Locomotion: Four locomotion tasks (Brockman et al., 2016) that feature complex physics and action spaces: Hopper, Ant, Walker2d, and Half Cheetah; d) Manipulation: Four Fetch (Plappert et al., 2018) tasks with sparse rewards and hard exploration (reward given only upon success): Fetch Reach, Fetch Push, Fetch Slide, and Fetch Pick And Place; e) Driving: One CARLA autonomous-driving task (Dosovitskiy et al., 2017) that has complex high-dimensional multi-modal sensor inputs (camera, radar, IMU, collision, GPS, etc.): Town01. |
| Dataset Splits | No | The paper does not provide explicit details about training, validation, and test dataset splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers required to replicate the experiments. |
| Experiment Setup | Yes | Crucially, for fair comparisons we also make each method train 1) for the same number of gradient steps, 2) with the same mini-batch size and learning rate, 3) using roughly the same number of weights, and 4) with a common set of hyperparameters (tuned with vanilla SAC) for the SAC backbone . More details of the experimental settings are described in Appendix J. In our experiments, we set the repeating hyperparameter N to 3 on Simple Control, Locomotion and Manipulation, and to 5 on Terrain and Driving. |