Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

Authors: Xiaohan Qin, Xiaoxing Wang, Ning Liao, Junchi Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning. Source code is available at https://github.com/jianke0604/NTKMTL. 4 Experiments
Researcher Affiliation	Academia	Xiaohan Qin1,2, Xiaoxing Wang1, Ning Liao1, Junchi Yan1,2, 1School of Artificial Intelligence & School of Computer Science, Shanghai Jiao Tong University 2Shanghai Innovation Institute
Pseudocode	Yes	Algorithm 1 NTKMTL Algorithm 2 NTKMTL-SR
Open Source Code	Yes	Source code is available at https://github.com/jianke0604/NTKMTL.
Open Datasets	Yes	multi-task supervised learning, experiments are conducted on several benchmarks, including dense prediction tasks on the NYUv2 [49] and City Scapes [15] datasets, regression tasks on the QM9 [8] dataset, and image-level classification on the Celeb A [36] dataset. For multi-task reinforcement learning, experiments are performed in the MT10 environment from the Meta-World benchmark [59].
Dataset Splits	No	multi-task supervised learning, experiments are conducted on several benchmarks, including dense prediction tasks on the NYUv2 [49] and City Scapes [15] datasets, regression tasks on the QM9 [8] dataset, and image-level classification on the Celeb A [36] dataset. For multi-task reinforcement learning, experiments are performed in the MT10 environment from the Meta-World benchmark [59].
Hardware Specification	Yes	All experiments were performed on a single NVIDIA RTX 4090. We also compared the training time in Fig. 1.
Software Dependencies	No	Specifically, we follow the setup outlined in previous works [39, 31] and use Soft Actor-Critic (SAC) [21] as the core algorithm. Our implementation builds upon the MTRL codebase from [39, 4], training the model for 2 million steps with a batch size of 1280.
Experiment Setup	Yes	On NYUv2 and City Scapes, we follow the training settings of [39, 4], including data augmentation for all compared methods. Training runs for 200 epochs, with the learning rate initialized at 10−4 and reduced to 5×10−5 after 100 epochs. The architecture is the SegNet-based [3] Multi-Task Attention Network (MTAN) [33]. Batch sizes are 2 (NYUv2) and 8 (City Scapes), and the hyparameter n for NTKMTL-SR on NYUv2 is set to 2. Our setup for the Celeb A benchmark aligns with the configuration detailed in [31]. We employ a 9-layer CNN as the network backbone, coupled with separate linear layers for each task. The method is trained for 15 epochs; optimization is carried out using Adam with a batch size of 256. Our implementation builds upon the MTRL codebase from [39, 4], training the model for 2 million steps with a batch size of 1280.