Efficient Planning in a Compact Latent Action Space
Authors: zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation in the offline RL setting demonstrates low decision latency which is indifferent to the growing raw action dimensionality. For Adroit robotic hand manipulation tasks with high-dimensional continuous action space, TAP surpasses existing model-based methods by a large margin and also beats strong model-free actor-critic baselines. |
| Researcher Affiliation | Collaboration | Zhengyao Jiang1 Tianjun Zhang2 Michael Janner2 Yueying Li3 Tim Rockt aschel1 Edward Grefenstette1,4 Yuandong Tian5 1University College London 2University of California, Berkeley 3Cornell University 4Cohere 5Meta AI (FAIR) |
| Pseudocode | Yes | The pseudocode of the TAP beam search is shown in Algorithm 1 in the Appendix. |
| Open Source Code | Yes | Source code is available at: github.com/Zhengyao Jiang/latentplan. |
| Open Datasets | Yes | The empirical evaluation of TAP consists of three sets of tasks from D4RL (Fu et al., 2020): gym locomotion control, Ant Maze, and Adroit. ... Following the evaluation protocol of TT and IQL, we use the v2 version of the datasets for locomotion control and v0 for the other tasks. |
| Dataset Splits | No | The paper mentions using "v2 version of the datasets for locomotion control and v0 for the other tasks" and an "evaluation protocol" but does not explicitly provide details about training, validation, and test splits (e.g., specific percentages or sample counts for each split). |
| Hardware Specification | Yes | Tests are done on a platform with an i5 12900K CPU and a single RTX3090 GPU. |
| Software Dependencies | No | The paper mentions software components and architectures like "Transformer", "VQ-VAE", and "Pixel CNN" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | As for the TAP-specific hyperparameters: we set the number of the steps associated with each latent variable to be L = 3 and each latent variable has K = 512 candidate values. The planning horizon in the raw action space is 15 for gym locomotion tasks and 24 for Adroit tasks. ... Other hyperparameters including architectures can be found in the Appendix. (See Table 7 for detailed list including learning rate, batch size, etc.) |