Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Offline Reinforcement Learning with Implicit Q-Learning
Authors: Ilya Kostrikov, Ashvin Nair, Sergey Levine
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm, implicit Q-Learning (IQL), aims to estimate this objective while evaluating the Q-function only on the state-action pairs in the dataset. ... Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for offline reinforcement learning. ... Our experiments aim to evaluate our method comparatively, in contrast to prior offline RL methods... |
| Researcher Affiliation | Academia | Ilya Kostrikov, Ashvin Nair & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Implicit Q-learning |
| Open Source Code | Yes | For the baselines, we measure runtime for our reimplementations of the methods in JAX (Bradbury et al., 2018) built on top of JAXRL (Kostrikov, 2021), which are typically faster than the original implementations. |
| Open Datasets | Yes | Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for offline reinforcement learning. ... Our approach on the D4RL (Fu et al., 2020) benchmark tasks... |
| Dataset Splits | No | No specific percentages, sample counts, or explicit methodology for training/validation/test splits were found for their own experiments. The paper refers to standard environments and general RL terms, but not specific reproducible splits for their work. |
| Hardware Specification | Yes | First, our algorithm is computationally efficient: we can perform 1M updates on one GTX1080 GPU in less than 20 minutes. |
| Software Dependencies | No | The paper mentions JAX (Bradbury et al., 2018) and cites JAXRL (Kostrikov, 2021) which is a GitHub repository, but does not provide specific version numbers for these or other software libraries like PyTorch or TensorFlow, or specific library versions for JAX. |
| Experiment Setup | No | The paper mentions the effect of the hyperparameter epsilon (expectile value) and beta (inverse temperature) for policy extraction, but a comprehensive list of hyperparameters (e.g., learning rates, batch sizes, optimizer settings) is not provided in the main text. For online fine-tuning, it states 'Exact experimental details are provided in Appendix C', implying that these details are not in the main body of the paper for the main experiments. |