Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generalized Decision Transformer for Offline Hindsight Information Matching
Authors: Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment on the Open AI Gym, Mu Jo Co tasks (Half Cheetah, Hopper, Walker2d, Ant-v3), a common benchmark for continuous control (Brockman et al., 2016; Todorov et al., 2012). Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage. We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set. We report the results averaged over 20 rollouts every 4 random seed. |
| Researcher Affiliation | Collaboration | Hiroki Furuta The University of Tokyo EMAIL Yutaka Matsuo The University of Tokyo Shixiang Shane Gu Google Research |
| Pseudocode | Yes | See Algorithm 1 (in Appendix F) for the full pseudocode. |
| Open Source Code | Yes | We share our implementation to ensure the reproducibility8. 8https://github.com/frt03/generalized_dt |
| Open Datasets | Yes | Through the experiments, we use medium-expert datasets in D4RL (Fu et al., 2020) to ensure the decent data coverage. |
| Dataset Splits | No | The paper states: 'We sort all the trajectories by their cumulative rewards, hold out five best trajectories and five 50 percentile trajectories as a test set (10 trajectories in total), and use the rest as a train set.' This specifies a train/test split, but no explicit mention of a separate validation set for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'pytorch implementation' in Appendix E.4 but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We implement Categorical Decision Transformer and Bi-directional Decision Transformer, built upon the official codebase released by Chen et al. (2021a) (https://github.com/kzl/decision-transformer). We follow the most of hyperparameters as they did (Table 6). |