reproducibilityindex.ai

Conservative Q-Learning for Offline Reinforcement Learning

Authors: Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Experimental Evaluation We compare CQL to prior ofﬂine RL methods on a range of domains and dataset compositions, including continuous and discrete action spaces, state observations of varying dimensionality, and high-dimensional image inputs.
Researcher Affiliation	Collaboration	Aviral Kumar1, Aurick Zhou1, George Tucker2, Sergey Levine1,2 1UC Berkeley, 2Google Research, Brain Team
Pseudocode	Yes	Algorithm 1 Conservative Q-Learning (both variants)
Open Source Code	No	The paper states 'Our algorithm requires an addition of only 20 lines of code on top of standard implementations of soft actor-critic (SAC) [19] for continuous control experiments and on top of QR-DQN [8] for the discrete control.' but does not provide a concrete link to their source code or an explicit statement of its release.
Open Datasets	Yes	We ﬁrst evaluate actor-critic CQL, using CQL(H) from Algorithm 1, on continuous control datasets from the D4RL benchmark [12]. ... using the dataset released by the authors [3].
Dataset Splits	No	The paper uses standard benchmarks like D4RL but does not explicitly state the specific training/validation/test dataset splits (percentages or counts) used for their experiments within the main text.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using standard implementations of soft actor-critic (SAC) and QR-DQN, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We use default hyperparameters from SAC, except that the learning rate for the policy was chosen from {3e-5, 1e-4, 3e-4}, and is less than or equal to the Q-function, as dictated by Theorem 3.3. Elaborate details are provided in Appendix F.