Generalized Decision Transformer for Offline Hindsight Information Matching
Authors: Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on the Open AI Gym, Mu Jo Co tasks (Half Cheetah, Hopper, Walker2d, Ant-v3), a common benchmark for continuous control (Brockman et al., 2016; Todorov et al., 2012). Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage. We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set. We report the results averaged over 20 rollouts every 4 random seed. |
| Researcher Affiliation | Collaboration | Hiroki Furuta The University of Tokyo furuta@weblab.t.u-tokyo.ac.jp Yutaka Matsuo The University of Tokyo Shixiang Shane Gu Google Research |
| Pseudocode | Yes | See Algorithm 1 (in Appendix F) for the full pseudocode. |
| Open Source Code | Yes | We share our implementation to ensure the reproducibility8. 8https://github.com/frt03/generalized_dt |
| Open Datasets | Yes | Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage. |
| Dataset Splits | No | The paper states: 'We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set.' This specifies a train/test split, but no explicit mention of a separate validation set for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'pytorch implementation' in Appendix E.4 but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We implement Categorical Decision Transformer and Bi-directional Decision Transformer, built upon the official codebase released by Chen et al. (2021a) (https://github.com/kzl/decision-transformer). We follow the most of hyperparameters as they did (Table 6). |