Towards Impartial Multi-task Learning

Authors: Liyang Liu, Yi Li, Zhanghui Kuang, Jing-Hao Xue, Yimin Chen, Wenming Yang, Qingmin Liao, Wayne Zhang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our IMTL on the standard MTL benchmarks including Cityscapes, NYUv2 and Celeb A. It outperforms existing loss weighting methods under the same experimental settings.
Researcher Affiliation Collaboration Liyang Liu1, Yi Li2, Zhanghui Kuang2, Jing-Hao Xue3, Yimin Chen2, Wenming Yang1 , Qingmin Liao1, Wayne Zhang2,4 1Shenzhen International Graduate School/Department of Electronic Engineering, Tsinghua University 2Sense Time Research 3Department of Statistical Science, University College London 4Qing Yuan Research Institute, Shanghai Jiao Tong University
Pseudocode Yes Algorithm 1 Training by Impartial Multi-task Learning
Open Source Code Yes We re-implement and compare with the representative MTL methods in a unified framework, which will be publicly available.
Open Datasets Yes We extensively evaluate our proposed IMTL on standard benchmarks: Cityscapes, NYUv2 and Celeb A, where the experimental results show that IMTL achieves superior performances under all settings. ... We run experiments on the Cityscapes (Cordts et al., 2016), NYUv2 (Silberman et al., 2012) and Celeb A (Liu et al., 2015) dataset to extensively analyze different methods.
Dataset Splits Yes For the Cityscapes dataset, ... We train on the 2975 training images and validate on the 500 validation images (1024 2048 full resolution) where ground truth labels are provided. ... On the NYUv2 dataset, ... We use the 795 training images for training and the 654 validation images for testing with 480 640 full resolution. ... Celeb A contains ... We train on the 162,770 training images and test on the 19,867 validation images.
Hardware Specification No No specific GPU/CPU models, processor types, or detailed computer specifications are mentioned for running experiments. Only the number of GPUs used for batch sizes is indicated (e.g., '2 x 16 GPUs').
Software Dependencies No No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions) are provided.
Experiment Setup Yes We use polynomial learning rate with a power of 0.9, SGD with a momentum of 0.9 and weight decay of 10^-4 as the optimizer, with the model trained for 200 epochs. For the Cityscapes dataset, the batch size is 32 (2 x 16 GPUs) with the initial learning rate 0.02. On the NYUv2 dataset, the batch size is 48 (6 x 8 GPUs) with the initial learning rate 0.03. ... on Celeb A ... the batch size is 256 (32 x 8 GPUs) and the model is trained from scratch for 100 epochs.