Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Online Decision Transformer
Authors: Qinqing Zheng, Amy Zhang, Aditya Grover
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we validate our overall framework by comparing its performance with state-of-the-art algorithms on the D4RL benchmark (Fu et al., 2020). We find that our relative improvements due to our finetuning strategy outperform other baselines (Nair et al., 2020; Kostrikov et al., 2021b), while exhibiting competitive absolute performance when accounting for the pretraining results of the base model. Finally, we supplement our main results with rigorous ablations and additional experimental designs to justify and validate the key components of our approach. |
| Researcher Affiliation | Collaboration | 1Meta AI Research 2University of California, Berkeley 3University of California, Los Angeles. |
| Pseudocode | Yes | Algorithm 1: Online Decision Transformer; Algorithm 2: ODT Training |
| Open Source Code | No | No explicit statement by the authors providing their *own* source code for the methodology was found. The paper mentions: "We use the official Pytorch implmentation2 for DT, the official JAX implementation3 for IQL, and the Pytorch implementation4 (Yarats & Kostrikov, 2020) for SAC." This refers to third-party code for baselines. The link "For more results, visit https://sites.google.com/view/onlinedt/home." is a project homepage, not a code repository. |
| Open Datasets | Yes | For answering both these questions, we focus on two types of tasks with offline datasets from the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. It discusses |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, cloud instances) used for running experiments are mentioned. |
| Software Dependencies | No | The paper mentions software components like "Pytorch," "JAX," "LAMB optimizer," and "Adam optimzier," but does not specify version numbers for any of them. |
| Experiment Setup | Yes | The complete list of hyperparameters of ODT are summarized in Appendix C. Table C.1 lists the common hyperparameters and Table C.2 lists the domain specific ones. For all the experiments, we optimize the policy parameter θ by the LAMB optimizer (You et al., 2019)... |