Understanding and Improving Information Transfer in Multi-Task Learning
Authors: Sen Wu, Hongyang R. Zhang, Christopher RĂ©
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We investigate multi-task learning approaches that use a shared feature representation for all tasks. ... Inspired by the theoretical insights, we show that aligning tasks embedding layers leads to performance gains for multi-task training and transfer learning on the GLUE benchmark and sentiment analysis tasks; for example, we obtain a 2.35% GLUE score average improvement on 5 GLUE tasks over BERTLARGE using our alignment method. We also design an SVD-based task reweighting scheme and show that it improves the robustness of multi-task training on a multi-label image dataset. |
| Researcher Affiliation | Academia | Sen Wu Stanford University Hongyang R. Zhang University of Pennsylvania Christopher R e Stanford University |
| Pseudocode | Yes | Algorithm 1 Covariance alignment for multi-task training; Algorithm 2 An SVD-based task re-weighting scheme |
| Open Source Code | No | The paper does not provide a link or explicit statement about the release of the authors' own implementation code for the described methods (e.g., covariance alignment or SVD-based re-weighting scheme). Links provided refer to third-party models/libraries like BERT or CheXNet. |
| Open Datasets | Yes | GLUE: GLUE is a natural language understanding dataset... (Wang et al. (2018b)); Sentiment Analysis: This dataset includes six tasks... (Lei et al. (2018)); Chest X-ray14: This dataset contains... (Wang et al. (2017)). |
| Dataset Splits | Yes | For the sentiment analysis experiments, we randomly split the data into training, dev and test sets with percentages 80%, 10%, and 10% respectively. |
| Hardware Specification | Yes | The experiments are partly run on Stanford s SOAL cluster. 5 The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. ... 5https://5harad.com/soal-cluster/ |
| Software Dependencies | No | The paper mentions software names like BERT, LSTM, and DenseNet, but does not specify exact version numbers for these or other key software components (e.g., Python, PyTorch, TensorFlow versions) required for reproduction. |
| Experiment Setup | Yes | For the synthetic experiments, we do a grid search over the learning rate from {1e 4, 1e 3, 1e 2, 1e 1} and the number of epochs from {10, 20, 30, 40, 50}. We pick the best results for all the experiments. We choose the learning rate to be 1e 3, the number of epochs to be 30, and the batch size to be 50. |