Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Authors: Mitsuhiko Nakamoto, Simon Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 fine-tuning benchmark tasks that we study in this paper. Code and video are available at https://nakamotoo.github.io/Cal-QL |
| Researcher Affiliation | Academia | Mitsuhiko Nakamoto1 Yuexiang Zhai1 Anikait Singh1 Max Sobol Mark2 Yi Ma1 Chelsea Finn2 Aviral Kumar1 Sergey Levine1 1UC Berkeley 2Stanford University |
| Pseudocode | Yes | Algorithm 1 Cal-QL pseudo-code |
| Open Source Code | Yes | Code and video are available at https://nakamotoo.github.io/Cal-QL |
| Open Datasets | Yes | We evaluate Cal-QL on a number of benchmark tasks and datasets used by prior works [30, 45] to evaluate fine-tuning performance: (1) The Ant Maze tasks from D4RL [10] that require controlling an ant quadruped robot to navigate from a starting point to a desired goal location in a maze. |
| Dataset Splits | No | The paper describes training and fine-tuning phases with specific step counts, but does not explicitly provide training/validation/test *dataset splits* with percentages or sample counts. |
| Hardware Specification | Yes | We used a single NVIDIA TITAN RTX chip to run each of our experiments. |
| Software Dependencies | No | The paper mentions building code upon 'Jax CQL' but does not provide specific version numbers for Python, JAX, PyTorch, or other libraries. |
| Experiment Setup | Yes | We list the hyperparameters for CQL and Cal-QL in Table 3. We utilized a variant of Bellman backup that computes the target value by performing a maximization over target values computed for k actions sampled from the policy at the next state, where we used k = 4 in visual pick and place domain and k = 10 in others. |