reproducibilityindex.ai

SMART: Self-supervised Multi-task pretrAining with contRol Transformers

Authors: Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show by extensive experiments in Deep Mind Control Suite that SMART significantly improves the learning efficiency among seen and unseen downstream tasks and domains under different learning scenarios including Imitation Learning (IL) and Reinforcement Learning (RL).
Researcher Affiliation	Collaboration	University of Maryland, College Park. {ycs,furongh}@umd.edu Microsoft Redmond, WA. {shuama,ratnesh.madaan,rbonatti,akapoor}@microsoft.com
Pseudocode	No	The paper provides mathematical formulations and architectural diagrams (e.g., Figure 1, Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our codebase, pretrained models and datasets are provided at https://github.com/microsoft/smart.
Open Datasets	Yes	We evaluate SMART on the Deep Mind Control (DMC) suite (Tassa et al., 2018), which contains a series of continuous control tasks with RGB image observations. ... Our codebase, pretrained models and datasets are provided at https://github.com/microsoft/smart.
Dataset Splits	Yes	In pretraining, we use an offline dataset collected over 5 tasks, while the other 5 tasks (with 2 unseen domains) are held out to test the generalizability of SMART. ... Sampled Replay (for RTG): We randomly sample 10% trajectories from the full replay buffer of 1 SAC agent, resulting in a dataset of size 100K per task, with diverse return distribution. Expert (for BC): We select 10% trajectories with the highest returns from the full replay buffer of 1 SAC agent, resulting in an expert dataset of size 100K per task.
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments, such as exact CPU or GPU models, memory specifications, or cloud instance types. It mentions using ResNet as an encoder backbone but this refers to a model architecture, not the hardware it runs on.
Software Dependencies	No	The paper mentions software components like 'min GPT implementation' and 'Adam W optimizer (Loshchilov & Hutter, 2019)' but does not specify their version numbers, which are required for reproducibility.
Experiment Setup	Yes	Our implementation of CT is based on a GPT model (Radford et al., 2018) with 8 layers and 8 attention heads. We use context length L = 30 and embedding size d = 256. ... For both pretraining and finetuning, the learning rate is set to be 6 10 4 and batch size 256. For learning rate, linear warmup and cosine decay are used. ... All models are trained for 10 epochs in pretraining, and 20 epochs for each downstream task.