Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring Tradeoffs through Mode Connectivity for Multi-Task Learning

Authors: Zhipeng Zhou, Ziqiao Meng, Pengcheng Wu, Peilin Zhao, Chunyan Miao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on key MTL datasets demonstrate that our proposed method, EXTRA (EXplore TRAde-offs), effectively identifies the desired point on the Pareto front and achieves state-of-the-art performance.
Researcher Affiliation	Academia	1Nanyang Technological University 2National University of Singapore 3School of Artificial Intelligence, Shanghai Jiao Tong University
Pseudocode	No	The paper describes its methodology in narrative text and through mathematical formulations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is avaliable at https://github.com/zzpustc/EXTRA.
Open Datasets	Yes	Extensive experiments on key MTL datasets demonstrate that our proposed method, EXTRA (EXplore TRAde-offs), effectively identifies the desired point on the Pareto front and achieves state-of-the-art performance. We conduct an ablation study on the NYUv2 dataset. Celeb A [Liu et al., 2015] is a widely used facial attributes dataset containing over 200,000 images annotated with 40 binary attributes. Multi MNIST: Following the experimental setup in Pa Ma L [Dimitriadis et al., 2023], we evaluate EXTRA on Multi MNIST, a widely used dataset for assessing how MTL/PFL methods approach the Pareto front.
Dataset Splits	Yes	The resulting dataset comprises 60,000 training, 10,000 validation, and 10,000 test samples.
Hardware Specification	Yes	All experiments were conducted on Tesla V100 GPUs.
Software Dependencies	No	The paper mentions using the Adam optimizer and specific model architectures like Seg Net and MTAN, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The model is trained using the Adam optimizer for 200 epochs. The initial learning rate is set to 1e-4 and decayed by a factor of 2 after 100 epochs. The batch size is set to 2 for NYUv2 and 8 for City Scapes. The control point numbers are 4 and 5 on City Scapes and NYUv2, respectively, with 2 and 3 trainable in the second stage. For the Celeb A dataset, we adopt a 9-layer convolutional neural network (CNN) as the backbone, with task-specific linear heads appended. The model is trained with the Adam optimizer for 15 epochs using an initial learning rate of 3e-4 and a batch size of 256. The control point number is 3, with 1 trainable in the second stage. The model is trained using Adam with a learning rate of 0.001, no learning rate scheduler, a batch size of 256, and a total of 10 training epochs. The control point number is 3, with 1 trainable in the second stage.