SMART: Self-supervised Multi-task pretrAining with contRol Transformers
Authors: Yanchao Sun, Shuang Ma, Ratnesh Madaan, Rogerio Bonatti, Furong Huang, Ashish Kapoor
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show by extensive experiments in Deep Mind Control Suite that SMART significantly improves the learning efficiency among seen and unseen downstream tasks and domains under different learning scenarios including Imitation Learning (IL) and Reinforcement Learning (RL). |
| Researcher Affiliation | Collaboration | University of Maryland, College Park. {ycs,furongh}@umd.edu Microsoft Redmond, WA. {shuama,ratnesh.madaan,rbonatti,akapoor}@microsoft.com |
| Pseudocode | No | The paper provides mathematical formulations and architectural diagrams (e.g., Figure 1, Figure 2) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our codebase, pretrained models and datasets are provided at https://github.com/microsoft/smart. |
| Open Datasets | Yes | We evaluate SMART on the Deep Mind Control (DMC) suite (Tassa et al., 2018), which contains a series of continuous control tasks with RGB image observations. ... Our codebase, pretrained models and datasets are provided at https://github.com/microsoft/smart. |
| Dataset Splits | Yes | In pretraining, we use an offline dataset collected over 5 tasks, while the other 5 tasks (with 2 unseen domains) are held out to test the generalizability of SMART. ... Sampled Replay (for RTG): We randomly sample 10% trajectories from the full replay buffer of 1 SAC agent, resulting in a dataset of size 100K per task, with diverse return distribution. Expert (for BC): We select 10% trajectories with the highest returns from the full replay buffer of 1 SAC agent, resulting in an expert dataset of size 100K per task. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for experiments, such as exact CPU or GPU models, memory specifications, or cloud instance types. It mentions using ResNet as an encoder backbone but this refers to a model architecture, not the hardware it runs on. |
| Software Dependencies | No | The paper mentions software components like 'min GPT implementation' and 'Adam W optimizer (Loshchilov & Hutter, 2019)' but does not specify their version numbers, which are required for reproducibility. |
| Experiment Setup | Yes | Our implementation of CT is based on a GPT model (Radford et al., 2018) with 8 layers and 8 attention heads. We use context length L = 30 and embedding size d = 256. ... For both pretraining and finetuning, the learning rate is set to be 6 10 4 and batch size 256. For learning rate, linear warmup and cosine decay are used. ... All models are trained for 10 epochs in pretraining, and 20 epochs for each downstream task. |