Hierarchical Visuomotor Control of Humanoids
Authors: Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, Greg Wayne
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTS We compared the various approaches on a variety of tasks implemented in Mu Jo Co (Todorov et al., 2012). The core tasks we considered for the main comparisons were Go-to-target, wall navigation (Walls), running on gapped platforms (Gaps), foraging for colored ball rewards (Forage), and a foraging task requiring the agent to remember the reward value of the different colored balls (Heterogeneous Forage) (see Fig. 5). |
| Researcher Affiliation | Industry | Deep Mind London, UK {jsmerel,arahuja,vuph,stunya,liusiqi,dhruvat, heess,gregwayne}@google.com |
| Pseudocode | No | The paper does not contain explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is publicly available, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | The data used in this project was obtained from mocap.cs.cmu.edu. The database was created with funding from NSF EIA-019621. |
| Dataset Splits | No | The paper describes training using a replay buffer and distributed actors, but does not specify explicit training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | The learner ran on a single Pascal 100 or Volta 100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam' and 'Mu Jo Co' but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | Task Parameters unroll LSTM state size value MLP gamma replay size Go To Target 10 128 (128, 1) 0.99 106 Walls / Gaps 20 128 (128, 1) 0.99 104 Forage 50 256 (200, 200, 1) 0.995 104 Heterogeneous Forage 200 256 (200, 200, 1) 0.99 105 Table 1: Parameters for training the agent on different environments/tasks. |