In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought
Authors: Sili Huang, Jifeng Hu, Hechang Chen, Lichao Sun, Bo Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods. In particular, the online evaluation time of our IDT is 36 times faster than baselines in the D4RL benchmark and 27 times faster in the Grid World benchmark. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Lehigh University, Bethlehem, Pennsylvania, USA. |
| Pseudocode | Yes | Appendix A. Pseudocode of In-context Decision Transformer |
| Open Source Code | Yes | Source code is available at here. |
| Open Datasets | Yes | Dataset: Grid World. In this section, we first consider the discrete control environments from the Grid World (Lee et al., 2022), which is a commonly used benchmark for recent in-context RL methods. Dateset: D4RL. D4RL (Fu et al., 2020) is a commonly used offline RL benchmark, including continuous control tasks. |
| Dataset Splits | No | The paper mentions training and testing, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | Experiments are carried out on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs. Besides, the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz. |
| Software Dependencies | No | The paper mentions software components like 'Re LU' and architectures like 'GPT model' and 'LSTM', but it does not provide specific software names with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') needed to replicate the experiment. |
| Experiment Setup | Yes | Table 2. Hyperparameters of IDT. Number of layers 3 Number of attention heads 3 Embedding dimension 128 Activation function Re LU c steps controlled by one high-level decision 10 D4RL and Large Grid World 5 Grid World Batch size 64 Dropout 0.1 Learning rate 1e-4 Learning rate decay Linear warmup for 1e5 steps Grad norm clip 0.25 Weight decay 1e-4 Number of trajectories to form across-episodic contexts n 4 (Large) Dark Key-to-Door 10 other tasks in Grid World 4 D4RL |