Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Authors: Joey Hong, Aviral Kumar, Sergey Levine
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. [...] We empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains. Our empirical results also confirm that conditioning on confidence, and controlling the confidence from online observations, can lead to significant improvements in performance. [...] In our experiments, we aim to evaluate our algorithm, CCVL on discrete-action offline RL tasks. |
| Researcher Affiliation | Academia | Joey Hong, Aviral Kumar, Sergey Levine University of California, Berkeley {joey hong,avirall}@berkeley.edu, svlevine@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Confidence-Conditioned Value Learning (CCVL) |
| Open Source Code | No | The paper does not provide a link to its source code or explicitly state that it is open-source. |
| Open Datasets | Yes | Next, we evaluate our algorithm against prior methods on Atari games (Bellemare et al., 2013) with offline datasets of varying size and quality, previously considered by Agarwal et al. (2020); Kumar et al. (2020). We follow the exact setup of Kumar et al. (2022), including evaluating across the same set of 17 games, using the same three offline datasets, with 1% and 5% of samples uniformly drawn from DQN replay dataset introduced in Agarwal et al. (2020), as well as a more suboptimal dataset consisting of 10% of the initial samples from the DQN dataset (corresponding to the first 20M observations during online DQN). |
| Dataset Splits | No | The paper describes the datasets and their sizes, but does not specify explicit train/validation/test splits or a methodology for creating them (e.g., percentages, counts, or explicit standard split names). |
| Hardware Specification | No | The paper mentions "compute resources from Google Cloud" in the acknowledgements but does not specify any particular hardware (e.g., GPU models, CPU types, or specific cloud instance configurations). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The REM and CQL baselines use exactly the hyperparamter configurations used by Kumar et al. (2022). We refer to Table E.1 of Kumar et al. (2022) for a table of hyperparamters used. For completion, we also reproduce the table in this section. Table 3: Hyperparameters used by the offline RL Atari agents in our experiments. |