Learning Multi-Task Communication with Message Passing for Sequence Learning

Authors: Pengfei Liu, Jie Fu, Yue Dong, Xipeng Qiu, Jackie Chi Kit Cheung4360-4367

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments in text classification and sequence labelling to evaluate our approach on multi-task learning and transfer learning. The empirical results show that our models not only outperform competitive baselines, but also learn interpretable and transferable patterns across tasks.
Researcher Affiliation Academia School of Computer Science, Fudan University, Shanghai Insitute of Intelligent Electroics & Systems MILA Mc Gill University {pfliu14,xpqiu}@fudan.edu.cn, jie.fu@polymtl.ca,yue.dong2@mail.mcgill.ca,jcheung@cs.mcgill.ca
Pseudocode Yes Algorithm 1 Training Process for Multi-task Learning over Graph Structures
Open Source Code No The paper does not contain an explicit statement or a link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The word embeddings for all of the models are initialized with the 200-dimensional Glo Ve vectors (840B token version (Pennington, Socher, and Manning 2014)).We use the following benchmark datasets in our experiments: Penn Treebank (PTB) POS tagging, Co NLL 2000 chunking, Co NLL 2003 English NER. The statistics of the datasets are described in Table 3.
Dataset Splits Yes All the datasets in each task are partitioned into training, validating, and testing with the proportions of 1400, 200 and 400 samples respectively.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, nor does it provide specific details such as GPU/CPU models, processor types, or memory specifications.
Software Dependencies No The paper mentions using GloVe vectors and Ada Delta for optimization but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, TensorFlow, or other dependencies.
Experiment Setup Yes The mini-batch size is set to 8. For each task, we take the hyperparameters which achieve the best performance on the development set via a grid search over combinations of the hidden size [100, 200, 300] and l2 regularization [0.0, 5E 5, 1E 5]. Additionally, for text classification tasks, we set an equal lambda for each task; while for tagging tasks, we run a grid search of lambda in the range of [1, 0.8, 0.5] and take the hyperparameters which achieve the best performance on the development set. Based on the validation performance, we choose the size of hidden state as 200 and l2 as 0.0. We apply stochastic gradient descent with the diagonal variant of Ada Delta for optimization (Zeiler 2012).