Learning Sparse Sharing Architectures for Multiple Tasks
Authors: Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang8936-8943
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters. |
| Researcher Affiliation | Academia | Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University {txsun19, yfshao19, pfliu14, hyan19, xpqiu, xjhuang}@fudan.edu.cn, lixiaonan@stu.xidian.edu.cn |
| Pseudocode | Yes | Algorithm 1 Sparse Sharing Architecture Learning |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | Our experiments are carried out on several widely used sequence labeling datasets, including Penn Treebank(PTB) (Marcus, Santorini, and Marcinkiewicz 1993), Co NLL-2000 (Sang and Buchholz 2000), Co NLL-2003 (Sang and Meulder 2003) and Onto Notes 5.0 English (Pradhan et al. 2012). |
| Dataset Splits | Yes | The statistics of the datasets are summarized in Table 1. We use the Wall Street Journal (WSJ) portion of PTB for POS. For Onto Notes, data in pt domain is excluded from our experiments due to its lack of NER annotations. The parse bits in Onto Notes are converted to chunking tags as the same as Co NLL-2003. We use the BIOES tagging scheme for NER and BIO2 for Chunking. Datasets Train Dev Test PTB 912,344 131,768 129,654 Co NLL-2000 211,727 47,377 Co NLL-2003 204,567 51,578 46,666 Onto Notes 5.0 1,903,815 279,495 204,235 |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments, only general model and training settings. |
| Software Dependencies | No | The paper mentions software components like CNN-Bi LSTM, GloVe, Dropout, SGD, and CRF, but does not specify their version numbers or any other software dependencies with version details. |
| Experiment Setup | Yes | Main hyper-parameters are summarized in Table 2. Hyper-parameters Embedding dimension 100 Convolution width 3 CNN output size 30 LSTM hidden size 200 Learning rate 0.1 Dropout 0.5 Mini-batch size 10. In all of our experiments, we use global pruning with α = 0.1. Word embeddings are excluded from pruning. Besides, we set the MTW steps w = 20, 10, 10 epochs in Exp1, Exp2, Exp3 respectively. |