reproducibilityindex.ai

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Authors: Mitsuhiko Nakamoto, Simon Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, Cal-QL outperforms state-of-the-art methods on 9/11 ﬁne-tuning benchmark tasks that we study in this paper. Code and video are available at https://nakamotoo.github.io/Cal-QL
Researcher Affiliation	Academia	Mitsuhiko Nakamoto1 Yuexiang Zhai1 Anikait Singh1 Max Sobol Mark2 Yi Ma1 Chelsea Finn2 Aviral Kumar1 Sergey Levine1 1UC Berkeley 2Stanford University
Pseudocode	Yes	Algorithm 1 Cal-QL pseudo-code
Open Source Code	Yes	Code and video are available at https://nakamotoo.github.io/Cal-QL
Open Datasets	Yes	We evaluate Cal-QL on a number of benchmark tasks and datasets used by prior works [30, 45] to evaluate ﬁne-tuning performance: (1) The Ant Maze tasks from D4RL [10] that require controlling an ant quadruped robot to navigate from a starting point to a desired goal location in a maze.
Dataset Splits	No	The paper describes training and fine-tuning phases with specific step counts, but does not explicitly provide training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	We used a single NVIDIA TITAN RTX chip to run each of our experiments.
Software Dependencies	No	The paper mentions building code upon 'Jax CQL' but does not provide specific version numbers for Python, JAX, PyTorch, or other libraries.
Experiment Setup	Yes	We list the hyperparameters for CQL and Cal-QL in Table 3. We utilized a variant of Bellman backup that computes the target value by performing a maximization over target values computed for k actions sampled from the policy at the next state, where we used k = 4 in visual pick and place domain and k = 10 in others.