reproducibilityindex.ai

Q-value Regularized Transformer for Offline Reinforcement Learning

Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods, highlighting the potential of QT to enhance the state-of-the-art in offline RL.
Researcher Affiliation	Collaboration	1Shanghai Jiao Tong University, China 2Shanghai AI Laboratory, China 3Sun Yat-sen University, China 4JD Explore Academy, China 5Nanyang Technological University, Singapore.
Pseudocode	Yes	Algorithm 1 QT: Q-value regularized Transformer
Open Source Code	Yes	Our code is available at: https://github.com/charleshsc/QT
Open Datasets	Yes	We present an extensive evaluation of our proposed QT model using the widely recognized D4RL benchmark (Fu et al., 2020).
Dataset Splits	No	The paper mentions evaluating on the D4RL benchmark tasks and normalizing scores, but it does not explicitly state the dataset split percentages (e.g., 80/10/10) or specific sample counts used for training, validation, or testing splits.
Hardware Specification	No	The paper does not specify any hardware used for experiments, such as particular CPU or GPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions using Adam optimizer and building on min GPT open-source code, but it does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	Table 5. Hyper-parameters of QT in our experiments. Parameter Value Number of layers 4 Number of attention heads 4 Embedding dimension 256 Nonlinearity function Re LU Batch size 256 Context length K 20 Dropout 0.1 Learning rate 3.0e-4