Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Injecting Imbalance Sensitivity for Multi-Task Learning

Authors: Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance. [...] The extensive experimental results present compelling evidence that IMGrad consistently enhances its baselines and surpasses the current advanced gradient manipulation methods in a diverse range of evaluations, e.g., supervised learning tasks, and reinforcement learning benchmarks.
Researcher Affiliation	Collaboration	Zhipeng Zhou 1 , Liu Liu 2, , Peilin Zhao 2 and Wei Gong 1, 1University of Science and Technology of China 2Tencent AI Lab EMAIL, EMAIL, EMAIL,
Pseudocode	No	The paper describes methods using mathematical formulations and textual explanations but does not contain explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs 4. 4Code is avaliable at https://github.com/zzpustc/IMGrad.
Open Datasets	Yes	We conducted experiments on the City Scapes dataset [Cordts et al., 2016] [...] NYUv2 is a widely used indoor scene understanding dataset for MTL benchmarking [...] The City Scapes dataset is used for MTL evaluation [...] Celeb A is a widely used face attributes dataset containing over 200,000 images annotated with 40 attributes. [...] we use CAGrad as the baseline and conduct experiments on the MT10 environment from the Meta-World benchmark [Yu et al., 2020b].
Dataset Splits	No	The paper refers to using datasets like City Scapes, NYUv2, Celeb A, and MT10, and mentions training for a certain number of epochs or using specific batch sizes, but it does not explicitly state the train/test/validation split percentages or sample counts for these datasets within the provided text. It mentions using a 'validation set' for MT10, but not the specific split.
Hardware Specification	Yes	We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs.
Software Dependencies	Yes	We implement our approach with Python 3.8, Py Torch 1.4.0 and cvxpy 1.3.1, while all experiments are carried out on Tesla V100 GPUs.
Experiment Setup	Yes	Specifically, models are trained for 200 epochs using the Adam optimizer, with an initial learning rate of 1e-4, which decays to 5e-5 after 100 epochs. [...] The model is trained using the Adam optimizer for 15 epochs, with an initial learning rate of 3.0e-4 and a batch size of 256.