reproducibilityindex.ai

Offline Reinforcement Learning with Implicit Q-Learning

Authors: Ilya Kostrikov, Ashvin Nair, Sergey Levine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our algorithm, implicit Q-Learning (IQL), aims to estimate this objective while evaluating the Q-function only on the state-action pairs in the dataset. ... Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for ofﬂine reinforcement learning. ... Our experiments aim to evaluate our method comparatively, in contrast to prior ofﬂine RL methods...
Researcher Affiliation	Academia	Ilya Kostrikov, Ashvin Nair & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {kostrikov,anair17}@berkeley.edu, svlevine@eecs.berkeley.edu
Pseudocode	Yes	Algorithm 1 Implicit Q-learning
Open Source Code	Yes	For the baselines, we measure runtime for our reimplementations of the methods in JAX (Bradbury et al., 2018) built on top of JAXRL (Kostrikov, 2021), which are typically faster than the original implementations.
Open Datasets	Yes	Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for ofﬂine reinforcement learning. ... Our approach on the D4RL (Fu et al., 2020) benchmark tasks...
Dataset Splits	No	No specific percentages, sample counts, or explicit methodology for training/validation/test splits were found for their own experiments. The paper refers to standard environments and general RL terms, but not specific reproducible splits for their work.
Hardware Specification	Yes	First, our algorithm is computationally efﬁcient: we can perform 1M updates on one GTX1080 GPU in less than 20 minutes.
Software Dependencies	No	The paper mentions JAX (Bradbury et al., 2018) and cites JAXRL (Kostrikov, 2021) which is a GitHub repository, but does not provide specific version numbers for these or other software libraries like PyTorch or TensorFlow, or specific library versions for JAX.
Experiment Setup	No	The paper mentions the effect of the hyperparameter epsilon (expectile value) and beta (inverse temperature) for policy extraction, but a comprehensive list of hyperparameters (e.g., learning rates, batch sizes, optimizer settings) is not provided in the main text. For online fine-tuning, it states 'Exact experimental details are provided in Appendix C', implying that these details are not in the main body of the paper for the main experiments.