Online Decision Transformer
Authors: Qinqing Zheng, Amy Zhang, Aditya Grover
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we validate our overall framework by comparing its performance with state-of-the-art algorithms on the D4RL benchmark (Fu et al., 2020). We find that our relative improvements due to our finetuning strategy outperform other baselines (Nair et al., 2020; Kostrikov et al., 2021b), while exhibiting competitive absolute performance when accounting for the pretraining results of the base model. Finally, we supplement our main results with rigorous ablations and additional experimental designs to justify and validate the key components of our approach. |
| Researcher Affiliation | Collaboration | 1Meta AI Research 2University of California, Berkeley 3University of California, Los Angeles. |
| Pseudocode | Yes | Algorithm 1: Online Decision Transformer; Algorithm 2: ODT Training |
| Open Source Code | No | No explicit statement by the authors providing their *own* source code for the methodology was found. The paper mentions: "We use the official Pytorch implmentation2 for DT, the official JAX implementation3 for IQL, and the Pytorch implementation4 (Yarats & Kostrikov, 2020) for SAC." This refers to third-party code for baselines. The link "For more results, visit https://sites.google.com/view/onlinedt/home." is a project homepage, not a code repository. |
| Open Datasets | Yes | For answering both these questions, we focus on two types of tasks with offline datasets from the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It discusses |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, cloud instances) used for running experiments are mentioned. |
| Software Dependencies | No | The paper mentions software components like "Pytorch," "JAX," "LAMB optimizer," and "Adam optimzier," but does not specify version numbers for any of them. |
| Experiment Setup | Yes | The complete list of hyperparameters of ODT are summarized in Appendix C. Table C.1 lists the common hyperparameters and Table C.2 lists the domain specific ones. For all the experiments, we optimize the policy parameter θ by the LAMB optimizer (You et al., 2019)... |