reproducibilityindex.ai

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Authors: Yi Ma, Jianye Hao, Hebin Liang, Chenjun Xiao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results clearly show that the proposed algorithms signiﬁcantly surpass DT on several control and navigation benchmarks.
Researcher Affiliation	Collaboration	1College of Intelligence and Computing, Tianjin University 2Huawei, Noah s Ark Lab 3The Chinese University of Hongkong, Shenzhen. Correspondence to: Jianye Hao <jianye.hao@tju.edu.cn>.
Pseudocode	No	The paper does not provide pseudocode or a clearly labeled algorithm block for its proposed methods (ADT, V-ADT, G-ADT).
Open Source Code	Yes	Codes for reproducing our results are provided here.
Open Datasets	Yes	We leverage datasets across several domains including Gym-Mujoco, Ant Maze, and Franka Kitchen from the ofﬂine RL benchmark D4RL (Fu et al., 2020).
Dataset Splits	No	The paper mentions training on 'trajectory data' and evaluation but does not specify the explicit percentages or sample counts for training, validation, or test splits. It uses D4RL datasets, which often have standard splits, but these are not stated in the paper's text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for its experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or cloud instance types with their specifications.
Software Dependencies	No	The paper mentions using 'Py Torch' and implementing based on 'CORL' and 'IQL/HIQL' codes, but it does not provide specific version numbers for these or other software dependencies necessary for reproduction (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup	Yes	We provide the lower-level actor s hyper-parameters used in our experiments in Table 5. Most hyper-parameters are set following the default conﬁgurations in DT. For the inverse temperature used in calculating the AWR loss of the lowerlevel actor in V-ADT, we set it to 1.0, 3.0, 6.0, 6.0, 6.0, 15.0 for antmazeumaze , umaze-diverse , medium-diverse , medium-play , large-diverse , large-play dataset, respectively; for other datasets, it is set 3.0. As for G-ADT, the inverse temperature is set to 1.0 for all the datasets. For the critic used in V-ADT and G-ADT, we follow the default architecture and learning settings in IQL (Kostrikov et al., 2022) and HIQL (Park et al., 2023), respectively. Detailed settings of other hyperparameters are provided in Appendix A.2.