Robust Multi-Task Learning with Excess Risks

Authors: Yifei He, Shiji Zhou, Guojun Zhang, Hyokun Yun, Yi Xu, Belinda Zeng, Trishul Chilimbi, Han Zhao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we evaluate our algorithm on various MTL benchmarks and demonstrate its superior performance over existing methods in the presence of label noise.
Researcher Affiliation Collaboration 1Department of Computer Science, University of Illinois Urbana-Chamapign, Urbana, IL, USA 2Department of Automation, Tsinghua University, Beijing, China 3School of Computer Science, University of Waterloo, Waterloo, ON, Canada 4Amazon, Seattle, WA, USA.
Pseudocode Yes Algorithm 1 Excess MTL
Open Source Code Yes Our code is available at https://github.com/yifei-he/Excess MTL.
Open Datasets Yes Multi MNIST (Sabour et al., 2017) is a multi-task version of the MNIST dataset. Office-Home (Venkateswara et al., 2017) consists of four image classification tasks: artistic images, clip art, product images, and real-world images. NYUv2 (Silberman et al., 2012) consists of RGB-D indoor images.
Dataset Splits Yes On the Office-Home dataset, we allocate 20% of the training data as a clean validation set and inject noise into the remainder.
Hardware Specification Yes The experiments are run on NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions software components like 'Adam optimizer and Re LU activation' and 'For the implementation of baselines, we use the code from Lin & Zhang (2023) and Navon et al. (2022)'. However, it does not provide specific version numbers for these software components or the underlying programming languages/frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes On Multi MNIST, we use a two-layer CNN with kernel size 5 followed by one fully connected layer with 80 hidden units as the feature extractor, trained with learning rate 1e-3. On Office-Home, we use a Res Net 18 (without pretraining) as the shared feature extractor, which is trained using a weight decay of 1e-5. The learning rate is 1e-4. On NYUv2... We use a weight decay of 1e-3 and the learning rate is 1e-4. To address this, we deploy a warm-up strategy, where we do not do weight update in the first 3 epochs to collect the average risks over those epochs as an estimation of the initial excess risk.