Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Continual Optimization with Symmetry Teleportation for Multi-Task Learning

Authors: Zhipeng Zhou, Ziqiao Meng, Pengcheng Wu, Peilin Zhao, Chunyan Miao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple mainstream datasets demonstrate the effectiveness of our approach. COST is a plug-and-play solution that enhances a wide range of existing MTL methods. When integrated with state-of-the-art methods, COST achieves superior performance.
Researcher Affiliation	Academia	1Nanyang Technological University 2National University of Singapore 3School of Artificial Intelligence, Shanghai Jiao Tong University
Pseudocode	Yes	C Algorithm We conclude the learning paradigm of COST in Algorithm 1. It should be noted that since COST is a scalable framework, thus the other MTL optimization in Algorithm 1 could be mainstream MTL approaches (e.g., CAGrad, Nash-MTL, and Fair Grad, etc).
Open Source Code	Yes	Code is avaliable at https://github.com/zzpustc/COST.
Open Datasets	Yes	Dense Prediction. City Scapes [Cordts et al., 2016] and NYUv2 [Silberman et al., 2012] are two widely-used scene understanding datasets, which are employed for the evaluation of MTL. Image Classification. Celeb A [Liu et al., 2015] is a commonly utilized face attributes dataset [Wang et al., 2024] that contains over 200,000 images and is annotated with 40 attributes. Regression. QM9 [Ramakrishnan et al., 2014] is another widely used drug discovery MTL dataset specifically for regression tasks.
Dataset Splits	Yes	In line with the implementation and training strategy of Fair Grad [Ban and Ji, 2024], we construct our model using Seg Net [Badrinarayanan et al., 2017] and employ MTAN [Liu et al., 2019] as the backbone within it. We train our model with the Adam optimizer for a total of 200 epochs, setting the initial learning rate to 1.0e-4 and reducing it to half after 100 epochs. The batch size is set to 2 for NYUv2 and 8 for City Scapes, respectively. In accordance with the setup of Fair Grad, we utilize a 9-layer convolutional neural network (CNN) as the backbone and linear layers as the task-specific heads on top of it. We train our model with the Adam optimizer for a total of 15 epochs, setting the initial learning rate to 3.0e-4. Moreover, the batch size is set to 256.
Hardware Specification	Yes	All experiments are carried out on a single Tesla V100 GPU.
Software Dependencies	No	The paper mentions "Adam optimizer" and "Hugging Face’s PEFT" but does not specify their version numbers.
Experiment Setup	Yes	We train our model with the Adam optimizer for a total of 200 epochs, setting the initial learning rate to 1.0e-4 and reducing it to half after 100 epochs. The batch size is set to 2 for NYUv2 and 8 for City Scapes, respectively. For Celeb A: We train our model with the Adam optimizer for a total of 15 epochs, setting the initial learning rate to 3.0e-4. Moreover, the batch size is set to 256. For QM9: Our approach is trained for 300 epochs with a batch size of 120. The initial learning rate is set to 1.0e-3, and a learning rate scheduler is applied to reduce the rate when the validation performance shows no further improvement. Llora = Lt γLg (11) where γ is the hyper-parameter (set as 0.1).