reproducibilityindex.ai

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Authors: Joey Hong, Aviral Kumar, Sergey Levine

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. [...] We empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. Our empirical results also confirm that conditioning on confidence, and controlling the confidence from online observations, can lead to significant improvements in performance. [...] In our experiments, we aim to evaluate our algorithm, CCVL on discrete-action offline RL tasks.
Researcher Affiliation	Academia	Joey Hong, Aviral Kumar, Sergey Levine University of California, Berkeley {joey hong,avirall}@berkeley.edu, svlevine@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Confidence-Conditioned Value Learning (CCVL)
Open Source Code	No	The paper does not provide a link to its source code or explicitly state that it is open-source.
Open Datasets	Yes	Next, we evaluate our algorithm against prior methods on Atari games (Bellemare et al., 2013) with offline datasets of varying size and quality, previously considered by Agarwal et al. (2020); Kumar et al. (2020). We follow the exact setup of Kumar et al. (2022), including evaluating across the same set of 17 games, using the same three offline datasets, with 1% and 5% of samples uniformly drawn from DQN replay dataset introduced in Agarwal et al. (2020), as well as a more suboptimal dataset consisting of 10% of the initial samples from the DQN dataset (corresponding to the first 20M observations during online DQN).
Dataset Splits	No	The paper describes the datasets and their sizes, but does not specify explicit train/validation/test splits or a methodology for creating them (e.g., percentages, counts, or explicit standard split names).
Hardware Specification	No	The paper mentions "compute resources from Google Cloud" in the acknowledgements but does not specify any particular hardware (e.g., GPU models, CPU types, or specific cloud instance configurations).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The REM and CQL baselines use exactly the hyperparamter configurations used by Kumar et al. (2022). We refer to Table E.1 of Kumar et al. (2022) for a table of hyperparamters used. For completion, we also reproduce the table in this section. Table 3: Hyperparameters used by the offline RL Atari agents in our experiments.