Q-value Regularized Transformer for Offline Reinforcement Learning

Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University, China 2Shanghai AI Laboratory, China 3Sun Yat-sen University, China 4JD Explore Academy, China 5Nanyang Technological University, Singapore.
Pseudocode Yes Algorithm 1 QT: Q-value regularized Transformer
Open Source Code Yes Our code is available at: https://github.com/charleshsc/QT
Open Datasets Yes We present an extensive evaluation of our proposed QT model using the widely recognized D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper mentions evaluating on the D4RL benchmark tasks and normalizing scores, but it does not explicitly state the dataset split percentages (e.g., 80/10/10) or specific sample counts used for training, validation, or testing splits.
Hardware Specification No The paper does not specify any hardware used for experiments, such as particular CPU or GPU models, or cloud computing instance types.
Software Dependencies No The paper mentions using Adam optimizer and building on min GPT open-source code, but it does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes Table 5. Hyper-parameters of QT in our experiments. Parameter Value Number of layers 4 Number of attention heads 4 Embedding dimension 256 Nonlinearity function Re LU Batch size 256 Context length K 20 Dropout 0.1 Learning rate 3.0e-4