Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Authors: Joey Hong, Aviral Kumar, Sergey Levine

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. [...] We empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. Our empirical results also confirm that conditioning on confidence, and controlling the confidence from online observations, can lead to significant improvements in performance. [...] In our experiments, we aim to evaluate our algorithm, CCVL on discrete-action offline RL tasks.
Researcher Affiliation Academia Joey Hong, Aviral Kumar, Sergey Levine University of California, Berkeley {joey hong,avirall}@berkeley.edu, svlevine@eecs.berkeley.edu
Pseudocode Yes Algorithm 1 Confidence-Conditioned Value Learning (CCVL)
Open Source Code No The paper does not provide a link to its source code or explicitly state that it is open-source.
Open Datasets Yes Next, we evaluate our algorithm against prior methods on Atari games (Bellemare et al., 2013) with offline datasets of varying size and quality, previously considered by Agarwal et al. (2020); Kumar et al. (2020). We follow the exact setup of Kumar et al. (2022), including evaluating across the same set of 17 games, using the same three offline datasets, with 1% and 5% of samples uniformly drawn from DQN replay dataset introduced in Agarwal et al. (2020), as well as a more suboptimal dataset consisting of 10% of the initial samples from the DQN dataset (corresponding to the first 20M observations during online DQN).
Dataset Splits No The paper describes the datasets and their sizes, but does not specify explicit train/validation/test splits or a methodology for creating them (e.g., percentages, counts, or explicit standard split names).
Hardware Specification No The paper mentions "compute resources from Google Cloud" in the acknowledgements but does not specify any particular hardware (e.g., GPU models, CPU types, or specific cloud instance configurations).
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The REM and CQL baselines use exactly the hyperparamter configurations used by Kumar et al. (2022). We refer to Table E.1 of Kumar et al. (2022) for a table of hyperparamters used. For completion, we also reproduce the table in this section. Table 3: Hyperparameters used by the offline RL Atari agents in our experiments.