reproducibilityindex.ai

Future-conditioned Unsupervised Pretraining for Decision Transformer

Authors: Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Yang Wei, Shuai Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PDT on a set of Gym Mu Jo Co tasks from the D4RL benchmark (Fu et al., 2020). Compared with its supervised counterpart (Zheng et al., 2022), PDT exhibits very competitive performance, especially when the offline data is far from expert behaviors. Our analysis further verifies that PDT can: 1) make different decisions when conditioned on various target futures, 2) controllably sample futures according to their predicted returns, and 3) efficiently generalize to out-of-distribution tasks.
Researcher Affiliation	Collaboration	1John Hopcroft Center for Computer Science, Shanghai Jiao Tong University, Shanghai, China 2Tencent AI Lab, Shenzhen, China.
Pseudocode	Yes	Algorithm 1 Future-conditioned Pretraining and Algorithm 2 Online Finetuning
Open Source Code	Yes	Code is available at here. (referring to a footnote or link provided later, and the paper explicitly states 'Our PDT implementation is based on the ODT codebase' which has a link to its github: '5https://github.com/facebookresearch/ online-dt')
Open Datasets	Yes	We evaluate our method on the Gym Mu Jo Co datasets from D4RL (Fu et al., 2020).
Dataset Splits	No	The paper states it uses 'reward-free offline dataset D' for pretraining and '200k online transitions' for finetuning, but it does not specify explicit train/validation/test splits as percentages or sample counts for the offline dataset, nor does it detail a validation split for the online finetuning process.
Hardware Specification	Yes	We conduct our experiments on a GPU cluster with 8 Nvidia 3090 graphic cards.
Software Dependencies	No	The paper mentions using 'LAMB optimizer' and 'Adam optimizer' but does not specify their version numbers or the versions of other software dependencies like Python or PyTorch.
Experiment Setup	Yes	Table 3. Common hyperparameters that are used to train PDT in all the experiments. (e.g., number of layers 4, number of attention heads 4, embedding dimension 512, batch size 256, learning rate 0.0001).