Online Decision Transformer

Authors: Qinqing Zheng, Amy Zhang, Aditya Grover

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we validate our overall framework by comparing its performance with state-of-the-art algorithms on the D4RL benchmark (Fu et al., 2020). We find that our relative improvements due to our finetuning strategy outperform other baselines (Nair et al., 2020; Kostrikov et al., 2021b), while exhibiting competitive absolute performance when accounting for the pretraining results of the base model. Finally, we supplement our main results with rigorous ablations and additional experimental designs to justify and validate the key components of our approach.
Researcher Affiliation Collaboration 1Meta AI Research 2University of California, Berkeley 3University of California, Los Angeles.
Pseudocode Yes Algorithm 1: Online Decision Transformer; Algorithm 2: ODT Training
Open Source Code No No explicit statement by the authors providing their *own* source code for the methodology was found. The paper mentions: "We use the official Pytorch implmentation2 for DT, the official JAX implementation3 for IQL, and the Pytorch implementation4 (Yarats & Kostrikov, 2020) for SAC." This refers to third-party code for baselines. The link "For more results, visit https://sites.google.com/view/onlinedt/home." is a project homepage, not a code repository.
Open Datasets Yes For answering both these questions, we focus on two types of tasks with offline datasets from the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. It discusses
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, cloud instances) used for running experiments are mentioned.
Software Dependencies No The paper mentions software components like "Pytorch," "JAX," "LAMB optimizer," and "Adam optimzier," but does not specify version numbers for any of them.
Experiment Setup Yes The complete list of hyperparameters of ODT are summarized in Appendix C. Table C.1 lists the common hyperparameters and Table C.2 lists the domain specific ones. For all the experiments, we optimize the policy parameter θ by the LAMB optimizer (You et al., 2019)...