Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Authors: Mitsuhiko Nakamoto, Simon Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 fine-tuning benchmark tasks that we study in this paper. Code and video are available at https://nakamotoo.github.io/Cal-QL
Researcher Affiliation Academia Mitsuhiko Nakamoto1 Yuexiang Zhai1 Anikait Singh1 Max Sobol Mark2 Yi Ma1 Chelsea Finn2 Aviral Kumar1 Sergey Levine1 1UC Berkeley 2Stanford University
Pseudocode Yes Algorithm 1 Cal-QL pseudo-code
Open Source Code Yes Code and video are available at https://nakamotoo.github.io/Cal-QL
Open Datasets Yes We evaluate Cal-QL on a number of benchmark tasks and datasets used by prior works [30, 45] to evaluate fine-tuning performance: (1) The Ant Maze tasks from D4RL [10] that require controlling an ant quadruped robot to navigate from a starting point to a desired goal location in a maze.
Dataset Splits No The paper describes training and fine-tuning phases with specific step counts, but does not explicitly provide training/validation/test *dataset splits* with percentages or sample counts.
Hardware Specification Yes We used a single NVIDIA TITAN RTX chip to run each of our experiments.
Software Dependencies No The paper mentions building code upon 'Jax CQL' but does not provide specific version numbers for Python, JAX, PyTorch, or other libraries.
Experiment Setup Yes We list the hyperparameters for CQL and Cal-QL in Table 3. We utilized a variant of Bellman backup that computes the target value by performing a maximization over target values computed for k actions sampled from the policy at the next state, where we used k = 4 in visual pick and place domain and k = 10 in others.