A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
Authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. |
| Researcher Affiliation | Collaboration | 1Google Research 2Carnegie Mellon University 3UC Berkeley. |
| Pseudocode | No | The paper describes algorithms and updates but does not provide structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code for the tabular experiments is available online. Code: https://github.com/ben-eysenbach/ac-connection |
| Open Datasets | Yes | we will repeat our experiments on four datasets from the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions using datasets for experiments but does not explicitly detail training, validation, and test splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper describes experimental setups and environments (e.g., gridworld, D4RL benchmark) but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "implementation of one-step RL (reverse KL) and CQL provided by Hoffman et al. (2020)", which refers to another paper's implementation. However, it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We use γ = 0.95 and train for 20k full-batch updates, using a learning rate of 1e-2. The Q table is randomly initialized using a standard normal distribution. |