reproducibilityindex.ai

Masked Autoencoding for Scalable and Generalizable Decision Making

Authors: Fangchen Liu, Hao Liu, Aditya Grover, Pieter Abbeel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our empirical study, we ﬁnd that a Mask DP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching, and it can zero-shot infer skills from a few example transitions. In addition, Mask DP transfers well to ofﬂine RL and shows promising scaling behavior w.r.t. to model size. It is amenable to data-efﬁcient ﬁnetuning, achieving competitive results with prior methods based on autoregressive pretraining1. In our experiments, we evaluate transfer learning in downstream tasks using Mask DP. Section 4.1 introduces the environments, pretraining, and the baselines compared in experiments. Section 4.2 summarizes the results of Mask DP on goal reaching, skill prompting, and ofﬂine RL. Through further analysis in Section 4.3, we present an ablation study on various design choices of our model.
Researcher Affiliation	Academia	Fangchen Liu1 * Hao Liu1 * Aditya Grover2 Pieter Abbeel1 1 Berkeley AI Research, UC Berkeley 2 UCLA
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The implementation of Mask DP is available at https://github.com/Fangchen Liu/Mask DP_public
Open Datasets	Yes	We adopt the environment setup used in EXo RL [37], based on Deep Mind control suite [29], where a domain describes the type of agent (e.g. Walker) but tasks are speciﬁed by rewards (e.g., Walker walk, Walker run). We provide a 2M buffer of the data collected by Proto-RL [36] as in Exo RL [37] does
Dataset Splits	Yes	single-goal reaching: For every trajectory in the validation set, we randomly sample a start state and a future state in T ∈ [15, 20) steps as the goal. All the methods are evaluated on the same set of 300 state-goal pairs with a given budget of T + 3. multi-goal reaching: For every trajectory in the validation set, we randomly sample a start state and 5 goal states at random future timesteps from [12, 60). We evaluate the same set of 100 state-goal sequences and add additional 5 timestep budgets for all the goals.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not list specific version numbers for software dependencies or libraries used in the experiments (e.g., Python version, PyTorch/TensorFlow versions).
Experiment Setup	Yes	We pretrain agents for 400K gradient steps. By default, Mask DP uses a 3-layer encoder and 2-layer decoder, and the baselines based on GPT use 5 attention layers. Mask DP and all the above models are comparable with similar architecture design and size, and share the same training hyper-parameters. Details about the architecture and training of Mask DP and the above baselines can be found in Section A.