Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joint Learning of Constituency and Dependency Grammars by Decomposed Cross-Lingual Induction

Authors: Wenbin Jiang, Qun Liu, Thepchai Supnithi

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment on joint cross-lingual induction of constituency and dependency grammars from English to Chinese. We ๏ฌrst verify the effectiveness of the transition-based variant model for constituency parsing. On WSJ treebank, this model achieves accuracy comparable to the classic transition-based model. The joint constituency and dependency grammar induced by the decomposed strategy achieves very significant improvement in both constituency and dependency grammar induction. Section 5: Experiments
Researcher Affiliation Academia 1Key Laboratory of Intelligent Information Processing, Institute of Computing Technology Chinese Academy of Sciences, China 2ADAPT Centre, School of Computing, Dublin City University, Ireland 3National Electronics and Computer Technology Center, Thailand
Pseudocode Yes Algorithm 1 K-beam transition-based parsing.
Open Source Code No No explicit statement or link providing access to the authors' own source code for the described methodology was found. The only URL provided is for a third-party maximum entropy toolkit.
Open Datasets Yes We ๏ฌrst evaluate the performance of the remodeled transition-based parsing algorithm on the Wall Street Journal Treebank (WSJ) [Marcus et al., 1993]... We use FBIS Chinese-English dataset as the bilingual corpus for cross-lingual induction. The accuracy of the induced grammar is evaluated on some portions of the Penn Chinese Treebank (CTB) [Xue et al., 2005].
Dataset Splits Yes Table 2: Data partitioning for WSJ and CTB, in unit of section. Treebank Training Developing Testing WSJ 02-21 22 23 1-270 CTB 400-931 301-325 271-300 1001-1151
Hardware Specification No No specific hardware details (such as GPU or CPU models, memory, or cloud instance types) used for running the experiments were mentioned in the paper.
Software Dependencies No The paper mentions using 'the maximum entropy toolkit by Zhang' and 'GIZA++ [Och, 2003]' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We set the gaussian prior as 1.0, the cutoff threshold as 0 (without cutoff), and the maximum training iteration as 100, while leaving other parameters as default values. For the k-beam transition-based parsing algorithm... less improvement can be obtained with k larger than 16.