Future-conditioned Unsupervised Pretraining for Decision Transformer

Authors: Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Yang Wei, Shuai Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate PDT on a set of Gym Mu Jo Co tasks from the D4RL benchmark (Fu et al., 2020). Compared with its supervised counterpart (Zheng et al., 2022), PDT exhibits very competitive performance, especially when the offline data is far from expert behaviors. Our analysis further verifies that PDT can: 1) make different decisions when conditioned on various target futures, 2) controllably sample futures according to their predicted returns, and 3) efficiently generalize to out-of-distribution tasks.
Researcher Affiliation Collaboration 1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China 2Tencent AI Lab, Shenzhen, China.
Pseudocode Yes Algorithm 1 Future-conditioned Pretraining and Algorithm 2 Online Finetuning
Open Source Code Yes Code is available at here. (referring to a footnote or link provided later, and the paper explicitly states 'Our PDT implementation is based on the ODT codebase' which has a link to its github: '5https://github.com/facebookresearch/ online-dt')
Open Datasets Yes We evaluate our method on the Gym Mu Jo Co datasets from D4RL (Fu et al., 2020).
Dataset Splits No The paper states it uses 'reward-free offline dataset D' for pretraining and '200k online transitions' for finetuning, but it does not specify explicit train/validation/test splits as percentages or sample counts for the offline dataset, nor does it detail a validation split for the online finetuning process.
Hardware Specification Yes We conduct our experiments on a GPU cluster with 8 Nvidia 3090 graphic cards.
Software Dependencies No The paper mentions using 'LAMB optimizer' and 'Adam optimizer' but does not specify their version numbers or the versions of other software dependencies like Python or PyTorch.
Experiment Setup Yes Table 3. Common hyperparameters that are used to train PDT in all the experiments. (e.g., number of layers 4, number of attention heads 4, embedding dimension 512, batch size 256, learning rate 0.0001).