reproducibilityindex.ai

Learning Sparse Sharing Architectures for Multiple Tasks

Authors: Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang8936-8943

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters.
Researcher Affiliation	Academia	Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University {txsun19, yfshao19, pﬂiu14, hyan19, xpqiu, xjhuang}@fudan.edu.cn, lixiaonan@stu.xidian.edu.cn
Pseudocode	Yes	Algorithm 1 Sparse Sharing Architecture Learning
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	Our experiments are carried out on several widely used sequence labeling datasets, including Penn Treebank(PTB) (Marcus, Santorini, and Marcinkiewicz 1993), Co NLL-2000 (Sang and Buchholz 2000), Co NLL-2003 (Sang and Meulder 2003) and Onto Notes 5.0 English (Pradhan et al. 2012).
Dataset Splits	Yes	The statistics of the datasets are summarized in Table 1. We use the Wall Street Journal (WSJ) portion of PTB for POS. For Onto Notes, data in pt domain is excluded from our experiments due to its lack of NER annotations. The parse bits in Onto Notes are converted to chunking tags as the same as Co NLL-2003. We use the BIOES tagging scheme for NER and BIO2 for Chunking. Datasets Train Dev Test PTB 912,344 131,768 129,654 Co NLL-2000 211,727 47,377 Co NLL-2003 204,567 51,578 46,666 Onto Notes 5.0 1,903,815 279,495 204,235
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments, only general model and training settings.
Software Dependencies	No	The paper mentions software components like CNN-Bi LSTM, GloVe, Dropout, SGD, and CRF, but does not specify their version numbers or any other software dependencies with version details.
Experiment Setup	Yes	Main hyper-parameters are summarized in Table 2. Hyper-parameters Embedding dimension 100 Convolution width 3 CNN output size 30 LSTM hidden size 200 Learning rate 0.1 Dropout 0.5 Mini-batch size 10. In all of our experiments, we use global pruning with α = 0.1. Word embeddings are excluded from pruning. Besides, we set the MTW steps w = 20, 10, 10 epochs in Exp1, Exp2, Exp3 respectively.