reproducibilityindex.ai

Improving Sequence-to-Sequence Constituency Parsing

Authors: Lemao Liu, Muhua Zhu, Shuming Shi

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Intensive experiments show that our parser delivers substantial improvements over the bottom-up linearization in accuracy, and it achieves 92.3 Fscore on the Penn English Treebank section 23 and 85.4 Fscore on the Penn Chinese Treebank test dataset, without reranking or semi-supervised training.
Researcher Affiliation	Industry	Lemao Liu, Muhua Zhu, Shuming Shi Tencent AI Lab, Shenzhen, China
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper mentions using a third-party toolkit ('https://code.google.com/p/berkeley-parser-analyser/') but does not provide access to the authors' own source code for their methodology.
Open Datasets	Yes	For English task, we use the benchmark of WSJ sections in Penn Treebank (PTB) (Marcus, Marcinkiewicz, and Santorini 1993), and we follow the standard splits: the training data ranges from section 2 to section 21; the development data is section 24; and the test data is section 23. For Chinese parsing task, we use the Penn Chinese Treebank (CTB) of the version 5.1 (Xue et al. 2005). The training data includes the articles 001-270 and articles 440-1151; the development data is the articles 301-325; and the test data is the articles 271-300.
Dataset Splits	Yes	For English task, we use the benchmark of WSJ sections in Penn Treebank (PTB) (Marcus, Marcinkiewicz, and Santorini 1993), and we follow the standard splits: the training data ranges from section 2 to section 21; the development data is section 24; and the test data is section 23. For Chinese parsing task, we use the Penn Chinese Treebank (CTB) of the version 5.1 (Xue et al. 2005). The training data includes the articles 001-270 and articles 440-1151; the development data is the articles 301-325; and the test data is the articles 271-300.
Hardware Specification	No	The paper mentions parallelization with GPUs but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for experiments.
Software Dependencies	No	The paper mentions using the 'Stanford tagger' and 'word2vec toolkit' but does not specify exact version numbers for these or any other software dependencies crucial for replication.
Experiment Setup	Yes	The overall hyper-parameters are shown in Table 2 without further tuned in our experiments. As there are randomnesses in our training algorithm, we independently run the algorithm for ﬁve times and the ﬁnal results was reported as the averaged results of these runs. The ensemble model includes these ﬁve independent models as its individual model. Table 2: Hyper-parameters used in our models. Parameter Value LSTM Layers 1 Hidden unit dim 512 Trained word embedding dim 256 English pretrained word embedding dim 100 Chinese pretrained word embedding dim 200 POS tag embedding dim 128 Dropout rate 0.3 English batch size 10 Chinese batch size 5 Beam size in search 10