A Connection between One-Step RL and Critic Regularization in Reinforcement Learning

Authors: Benjamin Eysenbach, Matthieu Geist, Sergey Levine, Ruslan Salakhutdinov

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters.
Researcher Affiliation Collaboration 1Google Research 2Carnegie Mellon University 3UC Berkeley.
Pseudocode No The paper describes algorithms and updates but does not provide structured pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code for the tabular experiments is available online. Code: https://github.com/ben-eysenbach/ac-connection
Open Datasets Yes we will repeat our experiments on four datasets from the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper mentions using datasets for experiments but does not explicitly detail training, validation, and test splits (e.g., percentages or sample counts for each split).
Hardware Specification No The paper describes experimental setups and environments (e.g., gridworld, D4RL benchmark) but does not specify any hardware details such as GPU/CPU models, memory, or specific computing platforms used for running the experiments.
Software Dependencies No The paper mentions using the "implementation of one-step RL (reverse KL) and CQL provided by Hoffman et al. (2020)", which refers to another paper's implementation. However, it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes We use γ = 0.95 and train for 20k full-batch updates, using a learning rate of 1e-2. The Q table is randomly initialized using a standard normal distribution.