Critic-Guided Decision Transformer for Offline Reinforcement Learning

Authors: Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on stochastic environments and D4RL benchmark datasets demonstrate the superiority of CGDT over traditional RCSL methods. These results highlight the potential of CGDT to advance the state of the art in offline RL and extend the applicability of RCSL to a wide range of RL tasks.
Researcher Affiliation Collaboration Yuanfu Wang*1, 2, Chao Yang*2, Ying Wen 1, Yu Liu2, 3, Yu Qiao 2 1Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3Sense Time Research
Pseudocode Yes Algorithm 1: Critic-Guided Decision Transformer
Open Source Code No The paper does not contain an explicit statement or a direct link indicating that the source code for the methodology is publicly available.
Open Datasets Yes We conduct further experiments on the D4RL datasets (Fu et al. 2020).
Dataset Splits No While the paper mentions utilizing "validation errors as a means to detect overfitting during critic training," it does not provide specific details regarding the percentages, counts, or methodology for training, validation, and test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies, including libraries, frameworks, or their version numbers, that are needed to replicate the experiments.
Experiment Setup Yes The algorithm implementation details are summarized in Algorithm 1. Initially, we set the hyperparameters τc and τp to 0.5. By varying τc and τp within the range of [0.3, 0.7], we control the asymmetries during critic training and policy training, respectively.