Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

Authors: Xiaohan Qin, Xiaoxing Wang, Ning Liao, Junchi Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning. Source code is available at https://github.com/jianke0604/NTKMTL. 4 Experiments
Researcher Affiliation Academia Xiaohan Qin1,2, Xiaoxing Wang1, Ning Liao1, Junchi Yan1,2, 1School of Artificial Intelligence & School of Computer Science, Shanghai Jiao Tong University 2Shanghai Innovation Institute
Pseudocode Yes Algorithm 1 NTKMTL Algorithm 2 NTKMTL-SR
Open Source Code Yes Source code is available at https://github.com/jianke0604/NTKMTL.
Open Datasets Yes multi-task supervised learning, experiments are conducted on several benchmarks, including dense prediction tasks on the NYUv2 [49] and City Scapes [15] datasets, regression tasks on the QM9 [8] dataset, and image-level classification on the Celeb A [36] dataset. For multi-task reinforcement learning, experiments are performed in the MT10 environment from the Meta-World benchmark [59].
Dataset Splits No multi-task supervised learning, experiments are conducted on several benchmarks, including dense prediction tasks on the NYUv2 [49] and City Scapes [15] datasets, regression tasks on the QM9 [8] dataset, and image-level classification on the Celeb A [36] dataset. For multi-task reinforcement learning, experiments are performed in the MT10 environment from the Meta-World benchmark [59].
Hardware Specification Yes All experiments were performed on a single NVIDIA RTX 4090. We also compared the training time in Fig. 1.
Software Dependencies No Specifically, we follow the setup outlined in previous works [39, 31] and use Soft Actor-Critic (SAC) [21] as the core algorithm. Our implementation builds upon the MTRL codebase from [39, 4], training the model for 2 million steps with a batch size of 1280.
Experiment Setup Yes On NYUv2 and City Scapes, we follow the training settings of [39, 4], including data augmentation for all compared methods. Training runs for 200 epochs, with the learning rate initialized at 10−4 and reduced to 5×10−5 after 100 epochs. The architecture is the SegNet-based [3] Multi-Task Attention Network (MTAN) [33]. Batch sizes are 2 (NYUv2) and 8 (City Scapes), and the hyparameter n for NTKMTL-SR on NYUv2 is set to 2. Our setup for the Celeb A benchmark aligns with the configuration detailed in [31]. We employ a 9-layer CNN as the network backbone, coupled with separate linear layers for each task. The method is trained for 15 epochs; optimization is carried out using Adam with a batch size of 256. Our implementation builds upon the MTRL codebase from [39, 4], training the model for 2 million steps with a batch size of 1280.