Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Connection between One-Step RL and Critic Regularization in Reinforcement Learning
Authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters. |
| Researcher Affiliation | Collaboration | 1Google Research 2Carnegie Mellon University 3UC Berkeley. |
| Pseudocode | No | The paper describes algorithms and updates but does not provide structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code for the tabular experiments is available online. Code: https://github.com/ben-eysenbach/ac-connection |
| Open Datasets | Yes | we will repeat our experiments on four datasets from the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper mentions using datasets for experiments but does not explicitly detail training, validation, and test splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | No | The paper describes experimental setups and environments (e.g., gridworld, D4RL benchmark) but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments. |
| Software Dependencies | No | The paper mentions using the "implementation of one-step RL (reverse KL) and CQL provided by Hoffman et al. (2020)", which refers to another paper's implementation. However, it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We use γ = 0.95 and train for 20k full-batch updates, using a learning rate of 1e-2. The Q table is randomly initialized using a standard normal distribution. |