reproducibilityindex.ai

Improving Offline RL by Blending Heuristics

Authors: Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we run HUBL with the four aforementioned offline RL methods CQL, TD3+BC, IQL, and ATAC and show that enhancing these So TA algorithms with HUBL can improve their performance by 9% on average across 27 datasets of D4RL (Fu et al., 2020) and Meta-World (Yu et al., 2020).
Researcher Affiliation	Collaboration	Sinong Geng Princeton University Princeton, NJ Aldo Pacchiano Boston University Broad Institute of MIT and Harvard Boston, MA Andrey Kolobov Microsoft Research Redmond, WA Ching-An Cheng Microsoft Research Redmond, WA
Pseudocode	Yes	Algorithm 1 HUBL + Offline RL 1: Input: Dataset D = {(s, a, s , r, γ)} 2: Compute ht for each trajectory in D 3: Compute λt for each trajectory in D 4: Relabel r & γ by ht and λt as r and γ and create D = {(s, a, s , r, γ)} 5: ˆπ Offline RL on D
Open Source Code	No	The paper lists code sources for the base offline RL methods (ATAC, CQL, IQL, TD3+BC) in Table 3. However, it does not provide a direct link or explicit statement that the code for HUBL itself, or the authors' implementation of HUBL, is open-source or publicly available.
Open Datasets	Yes	We study 27 benchmark datasets in D4RL and Meta-World. ... on 27 datasets of D4RL (Fu et al., 2020) and Meta-World (Yu et al., 2020).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with specific percentages, counts, or references to predefined splits.
Hardware Specification	Yes	Experiments with ATAC, IQL, and TD3+BC are ran on Standard F4S V2 nodes of Azure, and experiments with CQL are ran on NC6S V2 nodes of Azure.
Software Dependencies	No	The paper mentions that "The first-order optimization is implemented by ADAM (Kingma and Ba, 2014)" and refers to base methods using PyTorch (in Table 3). However, it does not provide specific version numbers for any software components (e.g., Python, PyTorch, or other libraries).
Experiment Setup	Yes	For each dataset, the hyperparameters of the base methods are tuned over six different configurations suggested by the original papers. Such configurations are summarized in Table 5. ... The first-order optimization is implemented by ADAM (Kingma and Ba, 2014) with a minibatch size as 256. The learning rates are selected following the original implementation and are reported in Table 4.