Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces

Authors: Jifeng Hu, Sili Huang, Li Shen, Zhejian Yang, Shengchao Hu, Shisong Tang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct extensive experiments on 15 CL tasks, including conventional CL settings and any CL task sequence settings. The results show that our method surpasses or matches the SOTA performance compared with 17 representative baselines. In this section, we will introduce environmental settings, evaluation metrics, and baselines in the following sections. Then, we will report and analyze the comparison results, ablation study, and parameter sensitivity analysis.
Researcher Affiliation Academia Jifeng Hu1 Sili Huang2 Li Shen3 Zhejian Yang1 Shengchao Hu4 Shisong Tang5 Hechang Chen1 Lichao Sun6 Yi Chang1 Dacheng Tao7 1Jilin University 2Minzu University of China 3Shenzhen Campus of Sun Yat-sen University 4Shanghai Jiao Tong University 5Tsinghua University 6Lehigh University 7Nanyang Technological University
Pseudocode Yes Algorithm 1: Vector-Quantized Continual Diffuser (VQ-CD) ... Algorithm 2: Evaluation Process
Open Source Code Yes The source code is available at https://github.com/JFHu/Vector_Quantized_Continual_Diffuser.
Open Datasets Yes Following previous studies [98], we select Mu Jo Co Ant-dir and Continual World (CW) to formulate traditional CL settings with the same state and action spaces. ... Additionally, we propose to leverage D4RL tasks [18] to construct the CL settings with diverse state and action spaces
Dataset Splits Yes Following previous studies [98], we select Mu Jo Co Ant-dir and Continual World (CW) to formulate traditional CL settings with the same state and action spaces. In Ant-dir, we select 10-15-19-25 and 4-18-26-34-42-49, for training and evaluation. In CW, we adopt the task setting of CW10, which contains 10 robotic manipulation tasks. Additionally, we propose to leverage D4RL tasks [18] to construct the CL settings with diverse state and action spaces, where the task datasets in D4RL (Hopper, Walker2d, and Half Cheetah) contains 6 difficulty settings (random, medium, expert, mediumexpert, medium-replay, and full-replay).
Hardware Specification Yes We conduct the experiments on NVIDIA Ge Force RTX 3090 GPUs and NVIDIA A10 GPUs, and the CPU type is Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz. ... We conduct the experiment with NVIDIA Ge Force RTX 3090 GPUs and Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz.
Software Dependencies No For all models, we use the Adam [54] optimizer to perform parameter updating. ... All the comparison methods used in this paper utilize their official codebases. Specifically, For L2M, we use the official source code: https://github.com/ml-jku/L2M For Cu GRO, we use the official source code: https://github.com/NJU-RL/Cu GRO For Co D, we use the official source code: https://github.com/JF-Hu/Continual_Diffuser For MTDIFF, we use the official source code: https://openreview.net/forum?id=f Ad Mly4ki5
Experiment Setup Yes We classify the hyperparameters shown in Table 3 into three categories: QSA module-related, SWA module-related, and training-related hyperparameters. We use the learning rate schedule when pretraining the QSA module, so the VQ learning rate decreases from 1e-3 to 1e-4. In our experiments, the maximum diffusion steps are set as 200, and the default structure is Unet. Usually, it is timeconsuming for the diffusion-based model to generate actions in RL. Thus, we consider the speed-up technique DDIM [83] and realize it in our method to improve the generation efficiency during evaluation. For all models, we use the Adam [54] optimizer to perform parameter updating. ... Table 3: The hyperparameters of VQ-CD.