reproducibilityindex.ai

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Authors: Sili Huang, Jifeng Hu, Zhejian Yang, Liwei Yang, Tao Luo, Hechang Chen, Lichao Sun, Bo Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28 times faster than the transformer-based baselines.
Researcher Affiliation	Collaboration	1,8Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education 1,2,3,6School of Artificial Intelligence, Jilin University, China 4,5 Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 7Lehigh University, Bethlehem, Pennsylvania, USA
Pseudocode	Yes	Algorithm 1: Decision Mamba-Hybrid. Input: A dataset of Trajectories, Max Iterations M as training phase, Max episodes m at testing phase, A number of trajectories n in across-episodic contexts used in Mamba model, A number of steps of actions c for one sub-goals Output: The generated actions
Open Source Code	Yes	Source code and more hyperparameters are described in Appendix B. We provide our code at...
Open Datasets	Yes	Dataset: Grid World. Dataset: Tmaze. Dataset: D4RL [13] is a commonly used offline RL benchmark, including continuous control tasks.
Dataset Splits	No	The paper mentions "offline training" and "sampling minibatches of trajectories" but does not specify explicit train/validation/test splits by percentages or sample counts for their experiments.
Hardware Specification	Yes	Experiments are carried out on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs. Besides, the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	In D4RL, Tmaze, and Large Grid World, the transformer model generates c = 20 steps actions while Mamba model generates one sub-goal. In conventional Grid World, we set c = 5 because the task is too short. In summary, Table 3 shows the hyperparameters used in our DM-H model.