Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Multi-Task Learning with Excess Risks
Authors: Yifei He, Shiji Zhou, Guojun Zhang, Hyokun Yun, Yi Xu, Belinda Zeng, Trishul Chilimbi, Han Zhao
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we evaluate our algorithm on various MTL benchmarks and demonstrate its superior performance over existing methods in the presence of label noise. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of Illinois Urbana-Chamapign, Urbana, IL, USA 2Department of Automation, Tsinghua University, Beijing, China 3School of Computer Science, University of Waterloo, Waterloo, ON, Canada 4Amazon, Seattle, WA, USA. |
| Pseudocode | Yes | Algorithm 1 Excess MTL |
| Open Source Code | Yes | Our code is available at https://github.com/yifei-he/Excess MTL. |
| Open Datasets | Yes | Multi MNIST (Sabour et al., 2017) is a multi-task version of the MNIST dataset. Office-Home (Venkateswara et al., 2017) consists of four image classification tasks: artistic images, clip art, product images, and real-world images. NYUv2 (Silberman et al., 2012) consists of RGB-D indoor images. |
| Dataset Splits | Yes | On the Office-Home dataset, we allocate 20% of the training data as a clean validation set and inject noise into the remainder. |
| Hardware Specification | Yes | The experiments are run on NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer and Re LU activation' and 'For the implementation of baselines, we use the code from Lin & Zhang (2023) and Navon et al. (2022)'. However, it does not provide specific version numbers for these software components or the underlying programming languages/frameworks (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | On Multi MNIST, we use a two-layer CNN with kernel size 5 followed by one fully connected layer with 80 hidden units as the feature extractor, trained with learning rate 1e-3. On Office-Home, we use a Res Net 18 (without pretraining) as the shared feature extractor, which is trained using a weight decay of 1e-5. The learning rate is 1e-4. On NYUv2... We use a weight decay of 1e-3 and the learning rate is 1e-4. To address this, we deploy a warm-up strategy, where we do not do weight update in the first 3 epochs to collect the average risks over those epochs as an estimation of the initial excess risk. |