reproducibilityindex.ai

Bootstrapped Transformer for Offline Reinforcement Learning

Authors: Kerong Wang, Hanye Zhao, Xufang Luo, Kan Ren, Weinan Zhang, Dongsheng Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on two offline RL benchmarks and demonstrate that our model can largely remedy the existing offline RL training limitations and beat other strong baseline methods.
Researcher Affiliation	Collaboration	Kerong Wang Shanghai Jiao Tong University Hanye Zhao Shanghai Jiao Tong University Xufang Luo Microsoft Research Asia Kan Ren Microsoft Research Asia Weinan Zhang Shanghai Jiao Tong University Dongsheng Li Microsoft Research Asia
Pseudocode	Yes	Algorithm 1 Training Procedure of Boo T
Open Source Code	Yes	The codes and supplementary materials are available at https://seqml.github.io/bootorl.
Open Datasets	Yes	We evaluate our Boo T algorithm on the dataset of continuous control tasks from the D4RL offline dataset [9].
Dataset Splits	No	The paper mentions using the D4RL dataset and random seeds for training and evaluation, but does not explicitly specify the training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	Yes	All the experiments are run on a server with Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, and NVIDIA V100 GPU (32GB memory) with 8 cores.
Software Dependencies	No	The paper mentions using AdamW optimizer, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version, or specific library versions).
Experiment Setup	Yes	We use AdamW optimizer with a learning rate of 6e-4 for TT training. The learning rate warms up for 10% of the total training steps, and linearly decays to 0. We train the model for 100 epochs with batch size 64 for Gym domain and 32 for Adroit domain. For discretization, we discretize states and actions into 1000 bins uniformly.