Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Authors: Mitsuhiko Nakamoto, Simon Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 ο¬ne-tuning benchmark tasks that we study in this paper. Code and video are available at https://nakamotoo.github.io/Cal-QL |
| Researcher Affiliation | Academia | Mitsuhiko Nakamoto1 Yuexiang Zhai1 Anikait Singh1 Max Sobol Mark2 Yi Ma1 Chelsea Finn2 Aviral Kumar1 Sergey Levine1 1UC Berkeley 2Stanford University |
| Pseudocode | Yes | Algorithm 1 Cal-QL pseudo-code |
| Open Source Code | Yes | Code and video are available at https://nakamotoo.github.io/Cal-QL |
| Open Datasets | Yes | We evaluate Cal-QL on a number of benchmark tasks and datasets used by prior works [30, 45] to evaluate ο¬ne-tuning performance: (1) The Ant Maze tasks from D4RL [10] that require controlling an ant quadruped robot to navigate from a starting point to a desired goal location in a maze. |
| Dataset Splits | No | The paper describes training and fine-tuning phases with specific step counts, but does not explicitly provide training/validation/test *dataset splits* with percentages or sample counts. |
| Hardware Specification | Yes | We used a single NVIDIA TITAN RTX chip to run each of our experiments. |
| Software Dependencies | No | The paper mentions building code upon 'Jax CQL' but does not provide specific version numbers for Python, JAX, PyTorch, or other libraries. |
| Experiment Setup | Yes | We list the hyperparameters for CQL and Cal-QL in Table 3. We utilized a variant of Bellman backup that computes the target value by performing a maximization over target values computed for k actions sampled from the policy at the next state, where we used k = 4 in visual pick and place domain and k = 10 in others. |