Offline Reinforcement Learning with Implicit Q-Learning
Authors: Ilya Kostrikov, Ashvin Nair, Sergey Levine
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm, implicit Q-Learning (IQL), aims to estimate this objective while evaluating the Q-function only on the state-action pairs in the dataset. ... Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for offline reinforcement learning. ... Our experiments aim to evaluate our method comparatively, in contrast to prior offline RL methods... |
| Researcher Affiliation | Academia | Ilya Kostrikov, Ashvin Nair & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley {kostrikov,anair17}@berkeley.edu, svlevine@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Implicit Q-learning |
| Open Source Code | Yes | For the baselines, we measure runtime for our reimplementations of the methods in JAX (Bradbury et al., 2018) built on top of JAXRL (Kostrikov, 2021), which are typically faster than the original implementations. |
| Open Datasets | Yes | Furthermore, our approach demonstrates the state-of-the-art performance on D4RL, a popular benchmark for offline reinforcement learning. ... Our approach on the D4RL (Fu et al., 2020) benchmark tasks... |
| Dataset Splits | No | No specific percentages, sample counts, or explicit methodology for training/validation/test splits were found for their own experiments. The paper refers to standard environments and general RL terms, but not specific reproducible splits for their work. |
| Hardware Specification | Yes | First, our algorithm is computationally efficient: we can perform 1M updates on one GTX1080 GPU in less than 20 minutes. |
| Software Dependencies | No | The paper mentions JAX (Bradbury et al., 2018) and cites JAXRL (Kostrikov, 2021) which is a GitHub repository, but does not provide specific version numbers for these or other software libraries like PyTorch or TensorFlow, or specific library versions for JAX. |
| Experiment Setup | No | The paper mentions the effect of the hyperparameter epsilon (expectile value) and beta (inverse temperature) for policy extraction, but a comprehensive list of hyperparameters (e.g., learning rates, batch sizes, optimizer settings) is not provided in the main text. For online fine-tuning, it states 'Exact experimental details are provided in Appendix C', implying that these details are not in the main body of the paper for the main experiments. |