reproducibilityindex.ai

In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

Authors: Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36 times faster than baselines in the D4RL benchmark and 27 times faster in the Grid World benchmark.
Researcher Affiliation	Academia	1School of Artificial Intelligence, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Lehigh University, Bethlehem, Pennsylvania, USA.
Pseudocode	Yes	Appendix A. Pseudocode of In-context Decision Transformer
Open Source Code	Yes	Source code is available at here.
Open Datasets	Yes	Dataset: Grid World. In this section, we first consider the discrete control environments from the Grid World (Lee et al., 2022), which is a commonly used benchmark for recent in-context RL methods. Dateset: D4RL. D4RL (Fu et al., 2020) is a commonly used offline RL benchmark, including continuous control tasks.
Dataset Splits	No	The paper mentions training and testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification	Yes	Experiments are carried out on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs. Besides, the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz.
Software Dependencies	No	The paper mentions software components like 'Re LU' and architectures like 'GPT model' and 'LSTM', but it does not provide specific software names with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') needed to replicate the experiment.
Experiment Setup	Yes	Table 2. Hyperparameters of IDT. Number of layers 3 Number of attention heads 3 Embedding dimension 128 Activation function Re LU c steps controlled by one high-level decision 10 D4RL and Large Grid World 5 Grid World Batch size 64 Dropout 0.1 Learning rate 1e-4 Learning rate decay Linear warmup for 1e5 steps Grad norm clip 0.25 Weight decay 1e-4 Number of trajectories to form across-episodic contexts n 4 (Large) Dark Key-to-Door 10 other tasks in Grid World 4 D4RL