Masked Trajectory Models for Prediction, Representation, and Control

Authors: Philipp Wu, Arjun Majumdar, Kevin Stone, Yixin Lin, Igor Mordatch, Pieter Abbeel, Aravind Rajeswaran

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments in several continuous control tasks, we show that the same MTM network i.e. same weights can match or outperform specialized networks trained for the aforementioned capabilities. Additionally, we find that state representations learned by MTM can significantly accelerate the learning speed of traditional RL algorithms. Finally, in offline RL benchmarks, we find that MTM is competitive with specialized offline RL algorithms, despite MTM being a generic self-supervised learning method without any explicit RL components.
Researcher Affiliation Collaboration Philipp Wu 1 2 Arjun Majumdar 3 Kevin Stone 1 Yixin Lin 1 Igor Mordatch 4 Pieter Abbeel 2 Aravind Rajeswaran 1 Equal contribution 1Meta AI 2UC Berkeley 3Georgia Tech 4Google Research. Correspondence to: Philipp <philippwu@berkeley.edu>.
Pseudocode No The paper describes the model and training process but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/ facebookresearch/mtm.
Open Datasets Yes D4RL (Fu et al., 2020) is a popular offline RL benchmark consisting of several environments and datasets. Following a number of prior work, we focus on the locomotion subset: Walker2D, Hopper, and Half Cheetah. For each environment, we consider 4 different dataset settings: Expert, Medium-Expert, Medium, and Medium-Replay. The Expert dataset is useful for benchmarking imitation learning with BC, while the other datasets enable studying offline RL and other capabilities of MTM such as future prediction and inverse dynamics. Adroit (Rajeswaran et al., 2018) is a collection of dexterous manipulation tasks with a simulated five-fingered. We experiment with the Pen, and Door tasks that test an agent s ability to carefully coordinate a large action-space to accomplish complex robot manipulation tasks. We collect Medium-Replay and Expert trajectories for each task using a protocol similar to D4RL. Ex ORL (Yarats et al., 2022) dataset consists of trajectories collected using various unsupervised exploration algorithms. Yarats et al. (2022) showed that TD3 (Fujimoto et al., 2018a) can be effectively used to learn in this benchmark. We use data collected by a Proto RL agent (Yarats et al., 2021) in the Walker2D environment to learn three different tasks: Stand, Walk, and Run.
Dataset Splits Yes For all experiments we train on 95% of the dataset and reserve 5% of the data for evaluation.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper mentions software components like 'Adam W optimizer', 'GELU', 'Layer Norm', 'Mu Jo Co simulator', and 'Open AI Gym', but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes C. Model and Training Details C.1. MLP Baseline Hyperparameters Table C.1. MLP Hyperparameters Hyperparameter Value MLP Nonlinearity GELU Batch Size 4096 Embedding Dim 1024 # of Layers 2 Adam Optimizer Learning Rate 0.0002 Weight Decay 0.005 Warmup Steps 5000 Training Steps 140000 Scheduler cosine decay C.2. MTM Model Hyperparameters Table C.2. MTM Hyperparameters Hyperparameter Value General Nonlinearity GELU Batch Size 1024 Trajectory-Segment Length 4 Scheduler cosine decay Warmup Steps 40000 Training Steps 140000 Dropout 0.10 Learning Rate 0.0001 Weight Decay 0.01 Bidirectional Transformer # of Encoder Layers 2 # Decoder Layers 1 # Heads 4 Embedding Dim 512 Mode Decoding Head Number of Layers 2 Embedding Dim 512