Q-value Regularized Transformer for Offline Reinforcement Learning
Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiao Tong University, China 2Shanghai AI Laboratory, China 3Sun Yat-sen University, China 4JD Explore Academy, China 5Nanyang Technological University, Singapore. |
| Pseudocode | Yes | Algorithm 1 QT: Q-value regularized Transformer |
| Open Source Code | Yes | Our code is available at: https://github.com/charleshsc/QT |
| Open Datasets | Yes | We present an extensive evaluation of our proposed QT model using the widely recognized D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions evaluating on the D4RL benchmark tasks and normalizing scores, but it does not explicitly state the dataset split percentages (e.g., 80/10/10) or specific sample counts used for training, validation, or testing splits. |
| Hardware Specification | No | The paper does not specify any hardware used for experiments, such as particular CPU or GPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions using Adam optimizer and building on min GPT open-source code, but it does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | Table 5. Hyper-parameters of QT in our experiments. Parameter Value Number of layers 4 Number of attention heads 4 Embedding dimension 256 Nonlinearity function Re LU Batch size 256 Context length K 20 Dropout 0.1 Learning rate 3.0e-4 |