reproducibilityindex.ai

Efficient Planning in a Compact Latent Action Space

Authors: zhengyao jiang, Tianjun Zhang, Michael Janner, Yueying Li, Tim Rocktäschel, Edward Grefenstette, Yuandong Tian

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation in the offline RL setting demonstrates low decision latency which is indifferent to the growing raw action dimensionality. For Adroit robotic hand manipulation tasks with high-dimensional continuous action space, TAP surpasses existing model-based methods by a large margin and also beats strong model-free actor-critic baselines.
Researcher Affiliation	Collaboration	Zhengyao Jiang1 Tianjun Zhang2 Michael Janner2 Yueying Li3 Tim Rockt aschel1 Edward Grefenstette1,4 Yuandong Tian5 1University College London 2University of California, Berkeley 3Cornell University 4Cohere 5Meta AI (FAIR)
Pseudocode	Yes	The pseudocode of the TAP beam search is shown in Algorithm 1 in the Appendix.
Open Source Code	Yes	Source code is available at: github.com/Zhengyao Jiang/latentplan.
Open Datasets	Yes	The empirical evaluation of TAP consists of three sets of tasks from D4RL (Fu et al., 2020): gym locomotion control, Ant Maze, and Adroit. ... Following the evaluation protocol of TT and IQL, we use the v2 version of the datasets for locomotion control and v0 for the other tasks.
Dataset Splits	No	The paper mentions using "v2 version of the datasets for locomotion control and v0 for the other tasks" and an "evaluation protocol" but does not explicitly provide details about training, validation, and test splits (e.g., specific percentages or sample counts for each split).
Hardware Specification	Yes	Tests are done on a platform with an i5 12900K CPU and a single RTX3090 GPU.
Software Dependencies	No	The paper mentions software components and architectures like "Transformer", "VQ-VAE", and "Pixel CNN" but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	As for the TAP-specific hyperparameters: we set the number of the steps associated with each latent variable to be L = 3 and each latent variable has K = 512 candidate values. The planning horizon in the raw action space is 15 for gym locomotion tasks and 24 for Adroit tasks. ... Other hyperparameters including architectures can be found in the Appendix. (See Table 7 for detailed list including learning rate, batch size, etc.)