Future-conditioned Unsupervised Pretraining for Decision Transformer
Authors: Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Yang Wei, Shuai Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PDT on a set of Gym Mu Jo Co tasks from the D4RL benchmark (Fu et al., 2020). Compared with its supervised counterpart (Zheng et al., 2022), PDT exhibits very competitive performance, especially when the offline data is far from expert behaviors. Our analysis further verifies that PDT can: 1) make different decisions when conditioned on various target futures, 2) controllably sample futures according to their predicted returns, and 3) efficiently generalize to out-of-distribution tasks. |
| Researcher Affiliation | Collaboration | 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China 2Tencent AI Lab, Shenzhen, China. |
| Pseudocode | Yes | Algorithm 1 Future-conditioned Pretraining and Algorithm 2 Online Finetuning |
| Open Source Code | Yes | Code is available at here. (referring to a footnote or link provided later, and the paper explicitly states 'Our PDT implementation is based on the ODT codebase' which has a link to its github: '5https://github.com/facebookresearch/ online-dt') |
| Open Datasets | Yes | We evaluate our method on the Gym Mu Jo Co datasets from D4RL (Fu et al., 2020). |
| Dataset Splits | No | The paper states it uses 'reward-free offline dataset D' for pretraining and '200k online transitions' for finetuning, but it does not specify explicit train/validation/test splits as percentages or sample counts for the offline dataset, nor does it detail a validation split for the online finetuning process. |
| Hardware Specification | Yes | We conduct our experiments on a GPU cluster with 8 Nvidia 3090 graphic cards. |
| Software Dependencies | No | The paper mentions using 'LAMB optimizer' and 'Adam optimizer' but does not specify their version numbers or the versions of other software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | Table 3. Common hyperparameters that are used to train PDT in all the experiments. (e.g., number of layers 4, number of attention heads 4, embedding dimension 512, batch size 256, learning rate 0.0001). |