Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Variational Task Vector Composition

Authors: Boyuan Zhang, Yingjun Du, Xiantong Zhen, Ling Shao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that our method achieves state-of-the-art average performance across a diverse range of benchmarks, including image classification and natural language understanding. These findings highlight the practical value of our approach, offering a new, efficient, and effective framework for task vector composition. We evaluate our framework on diverse benchmarks in computer vision and natural language processing. Experimental results show that VTVC achieves strong performance in various task arithmetic scenarios, including task addition and task negation. Our analyses of the three main contributions variational composition, Spike-and-Slab priors, and the gated sampling process highlight the method s advantages.
Researcher Affiliation	Collaboration	1UCAS-Terminus AI Lab, School of Engineering Science, University of Chinese Academy of Sciences 2VIS Lab, University of Amsterdam 3Central Research Institute, United Imaging Healthcare, Co., Ltd.
Pseudocode	No	The paper describes its methodology using mathematical equations and textual explanations in Section 4 'Methodology' and Appendix A 'A Mathematical Derivation of Variational Composition of Task Vectors', but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We provide all experimental details in the experiment section. The detailed and Python codes of our method can be found in Appendix and supplemental materials.
Open Datasets	Yes	For multi-task model merging, we follow previous work and use eight image classification datasets: Cars [20], DTD [5], Euro SAT [12], GTSRB [39], MNIST [22], RESISC45 [3], SUN397 [45], and SVHN [29]. For NLP applications, we use the General Language Understanding Evaluation(GLUE) benchmark [42].
Dataset Splits	Yes	We follow the standard experimental setup in Variational Task Vector Composition, where the data splits, is set as the same as previous works a TLAS [51].
Hardware Specification	Yes	All experiments were conducted on eight NVIDIA A40 GPUs.
Software Dependencies	No	The paper mentions specific models (e.g., CLIP Vision Transformer (Vi T) architectures: Vi T-B/16, Vi T-B/32, and Vi T-L/14; RoBERTa) and optimizers (AdamW) but does not provide specific version numbers for the software libraries (e.g., PyTorch, TensorFlow) or programming languages used to implement the methodology.
Experiment Setup	Yes	Implementation Details. We use three pre-trained CLIP Vision Transformer (Vi T) architectures: Vi T-B/16, Vi T-B/32, and Vi T-L/14 [31, 7]. For NLP tasks, we use Ro BERTa [24] as the backbone model. All experiments are trained with the Adam W optimizer [25]. For our method s gated posterior, we set the base threshold(ψ1) to 0.05 and the sensitivity parameter(ψ2) to 1.0. All experiments were conducted on eight NVIDIA A40 GPUs. Further details on hyperparameters and task-specific settings are provided in Appendix B. Appendix B: Regularization and Hyperparameter Settings: In our experimental implementation, we used a set of meticulously tuned hyperparameters: boundary width m = 0.1, boundary loss weight λb = 10 4, exploration loss weight λe = 10 3, uncertainty loss weight λu = 10 2, base threshold initial value ψ0 1 = 0.05, uncertainty sensitivity initial value ψ0 2 = 1.0, global regularization coefficient λ = 10 3 (Equation 13), and gating temperature parameter ρ = 20.0 (Equation 10).