Rethinking Decision Transformer via Hierarchical Reinforcement Learning

Authors: Yi Ma, Jianye Hao, Hebin Liang, Chenjun Xiao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results clearly show that the proposed algorithms significantly surpass DT on several control and navigation benchmarks.
Researcher Affiliation Collaboration 1College of Intelligence and Computing, Tianjin University 2Huawei, Noah s Ark Lab 3The Chinese University of Hongkong, Shenzhen. Correspondence to: Jianye Hao <jianye.hao@tju.edu.cn>.
Pseudocode No The paper does not provide pseudocode or a clearly labeled algorithm block for its proposed methods (ADT, V-ADT, G-ADT).
Open Source Code Yes Codes for reproducing our results are provided here.
Open Datasets Yes We leverage datasets across several domains including Gym-Mujoco, Ant Maze, and Franka Kitchen from the offline RL benchmark D4RL (Fu et al., 2020).
Dataset Splits No The paper mentions training on 'trajectory data' and evaluation but does not specify the explicit percentages or sample counts for training, validation, or test splits. It uses D4RL datasets, which often have standard splits, but these are not stated in the paper's text.
Hardware Specification No The paper does not explicitly describe the specific hardware used for its experiments, such as GPU models (e.g., NVIDIA A100), CPU models, or cloud instance types with their specifications.
Software Dependencies No The paper mentions using 'Py Torch' and implementing based on 'CORL' and 'IQL/HIQL' codes, but it does not provide specific version numbers for these or other software dependencies necessary for reproduction (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes We provide the lower-level actor s hyper-parameters used in our experiments in Table 5. Most hyper-parameters are set following the default configurations in DT. For the inverse temperature used in calculating the AWR loss of the lowerlevel actor in V-ADT, we set it to 1.0, 3.0, 6.0, 6.0, 6.0, 15.0 for antmazeumaze , umaze-diverse , medium-diverse , medium-play , large-diverse , large-play dataset, respectively; for other datasets, it is set 3.0. As for G-ADT, the inverse temperature is set to 1.0 for all the datasets. For the critic used in V-ADT and G-ADT, we follow the default architecture and learning settings in IQL (Kostrikov et al., 2022) and HIQL (Park et al., 2023), respectively. Detailed settings of other hyperparameters are provided in Appendix A.2.