reproducibilityindex.ai

Dependency or Span, End-to-End Uniform Semantic Role Labeling

Authors: Zuchao Li, Shexia He, Hai Zhao, Yiqing Zhang, Zhuosheng Zhang, Xi Zhou, Xiang Zhou6730-6737

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our single model achieves new state-of-the-art results on both span (Co NLL 2005, 2012) and dependency (Co NLL 2008, 2009) SRL benchmarks. For span SRL, our single model outperforms the previous best results by 0.3% and 0.5% F1-score on Co NLL 2005 and 2012 test sets respectively. For dependency SRL, we achieve new state-of-the-art of 85.3% F1 and 90.4% F1 on Co NLL 2008 and 2009 benchmarks respectively.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Cloud Walk Technology, Shanghai, China
Pseudocode	No	The paper describes the model architecture and components with text and equations, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available here: https://github.com/bcmi220/unisrl.
Open Datasets	Yes	For span SRL, we test model on the common span SRL datasets from Co NLL-2005 (Carreras and M arquez 2005) and Co NLL-2012 (Pradhan et al. 2013) shared tasks. For dependency SRL, we experiment on Co NLL 2008 (Surdeanu et al. 2008) and 2009 (Hajiˇc et al. 2009) benchmarks.
Dataset Splits	Yes	The Co NLL-2005 dataset takes section 2-21 of Wall Street Journal (WSJ) data as training set, and section 24 as development set. The test set consists of section 23 of WSJ for in-domain evaluation together with 3 sections from Brown corpus for out-of-domain evaluation. ... the training, development and test splits of English data are identical to that of Co NLL-2005.
Hardware Specification	Yes	All models are trained for up to 600 epochs with batch size 40 on a single NVIDIA Ge Force GTX 1080Ti GPU, which occupies 8 GB graphic memory and takes 12 to 36 hours.
Software Dependencies	No	The paper mentions using GloVe vectors, ELMo, and Adam optimizer, but does not specify software dependencies with version numbers (e.g., Python version, PyTorch version).
Experiment Setup	Yes	The word embeddings are 300-dimensional Glo Ve vectors... The character representations with dimension 8 randomly initialized. In the character CNN, the convolutions have window sizes of 3, 4, and 5, each consisting of 50 ﬁlters. Moreover, we use 3 stacked bidirectional LSTMs with 200 dimensional hidden states... We apply 0.5 dropout to the word embeddings and character CNN outputs and 0.2 dropout to all hidden layers and feature embeddings. In the LSTMs, we employ variational dropout masks that are shared across timesteps (Gal and Ghahramani 2016), with 0.4 dropout rate. All models are trained for up to 600 epochs with batch size 40 on a single NVIDIA Ge Force GTX 1080Ti GPU, which occupies 8 GB graphic memory and takes 12 to 36 hours.