Robust Multi-Task Learning with Excess Risks
Authors: Yifei He, Shiji Zhou, Guojun Zhang, Hyokun Yun, Yi Xu, Belinda Zeng, Trishul Chilimbi, Han Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate our algorithm on various MTL benchmarks and demonstrate its superior performance over existing methods in the presence of label noise. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Illinois Urbana-Chamapign, Urbana, IL, USA 2Department of Automation, Tsinghua University, Beijing, China 3School of Computer Science, University of Waterloo, Waterloo, ON, Canada 4Amazon, Seattle, WA, USA. |
| Pseudocode | Yes | Algorithm 1 Excess MTL |
| Open Source Code | Yes | Our code is available at https://github.com/yifei-he/Excess MTL. |
| Open Datasets | Yes | Multi MNIST (Sabour et al., 2017) is a multi-task version of the MNIST dataset. Office-Home (Venkateswara et al., 2017) consists of four image classification tasks: artistic images, clip art, product images, and real-world images. NYUv2 (Silberman et al., 2012) consists of RGB-D indoor images. |
| Dataset Splits | Yes | On the Office-Home dataset, we allocate 20% of the training data as a clean validation set and inject noise into the remainder. |
| Hardware Specification | Yes | The experiments are run on NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer and Re LU activation' and 'For the implementation of baselines, we use the code from Lin & Zhang (2023) and Navon et al. (2022)'. However, it does not provide specific version numbers for these software components or the underlying programming languages/frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | On Multi MNIST, we use a two-layer CNN with kernel size 5 followed by one fully connected layer with 80 hidden units as the feature extractor, trained with learning rate 1e-3. On Office-Home, we use a Res Net 18 (without pretraining) as the shared feature extractor, which is trained using a weight decay of 1e-5. The learning rate is 1e-4. On NYUv2... We use a weight decay of 1e-3 and the learning rate is 1e-4. To address this, we deploy a warm-up strategy, where we do not do weight update in the first 3 epochs to collect the average risks over those epochs as an estimation of the initial excess risk. |