Latent Multi-Task Architecture Learning
Authors: Sebastian Ruder, Joachim Bingel, Isabelle Augenstein, Anders Søgaard4822-4829
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on synthetic data and data from Onto Notes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL. |
| Researcher Affiliation | Collaboration | 1Insight Research Centre, National University of Ireland, Galway 2Aylien Ltd., Dublin, Ireland 3Department of Computer Science, University of Copenhagen, Denmark |
| Pseudocode | No | The paper does not include any explicit pseudocode blocks or algorithms labeled as such. It describes the model and mathematical formulations in prose and equations. |
| Open Source Code | Yes | We implement all models in Dy Net (Neubig et al. 2017) and make our code available at https://github.com/sebastianruder/sluice-networks. |
| Open Datasets | Yes | As testbed for our experiments, we choose the Onto Notes 5.0 dataset (Weischedel et al. 2013), not only due to its high inter-annotator agreement (Hovy et al. 2006), but also because it enables us to analyze the generalization ability of our models across different tasks and domains. The Onto Notes dataset provides data annotated for an array of tasks across different languages and domains. |
| Dataset Splits | Yes | We train our models on each domain and evaluate them both on the in-domain test set (Table 3, top) as well as on the test sets of all other domains (Table 3, bottom) to evaluate their out-of-domain generalization ability. ... Dev 29957 25271 15421 147955 25206 11200 49393 (from Table 1)... We perform early stopping with patience of 2 based on the main task and hyperparameter optimization on the in-domain development data of the newswire domain. |
| Hardware Specification | No | The paper describes the model architecture and training procedure, but it does not provide specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, or memory). |
| Software Dependencies | No | The paper mentions 'We implement all models in Dy Net (Neubig et al. 2017)', indicating the use of DyNet. However, it does not specify a version number for DyNet or any other software libraries or programming languages used. |
| Experiment Setup | Yes | The Bi LSTM consists of 3 layers with 100 dimensions that uses 64-dimensional word and 100-dimensional character embeddings, which are both randomly initialized. The output layer is an MLP with a dimensionality of 100. ... We train our models with stochastic gradient descent (SGD), an initial learning rate of 0.1, and learning rate decay5. ... We initialize α parameters with a bias towards one source subspace for each direction and initialize β parameters with a bias towards the last layer4. |