Generalized Decision Transformer for Offline Hindsight Information Matching

Authors: Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on the Open AI Gym, Mu Jo Co tasks (Half Cheetah, Hopper, Walker2d, Ant-v3), a common benchmark for continuous control (Brockman et al., 2016; Todorov et al., 2012). Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage. We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set. We report the results averaged over 20 rollouts every 4 random seed.
Researcher Affiliation Collaboration Hiroki Furuta The University of Tokyo furuta@weblab.t.u-tokyo.ac.jp Yutaka Matsuo The University of Tokyo Shixiang Shane Gu Google Research
Pseudocode Yes See Algorithm 1 (in Appendix F) for the full pseudocode.
Open Source Code Yes We share our implementation to ensure the reproducibility8. 8https://github.com/frt03/generalized_dt
Open Datasets Yes Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage.
Dataset Splits No The paper states: 'We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set.' This specifies a train/test split, but no explicit mention of a separate validation set for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions 'pytorch implementation' in Appendix E.4 but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We implement Categorical Decision Transformer and Bi-directional Decision Transformer, built upon the official codebase released by Chen et al. (2021a) (https://github.com/kzl/decision-transformer). We follow the most of hyperparameters as they did (Table 6).