reproducibilityindex.ai

A Morphology-Aware Network for Morphological Disambiguation

Authors: Eray Yildiz, Caglar Tirkaz, H. Sahin, Mustafa Eren, Omer Sonmez

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we achieve 84.12 , 88.35 and 93.78 morphological disambiguation accuracy among the ambiguous words for Turkish, German and French respectively.
Researcher Affiliation	Industry	Eray Yildiz, Caglar Tirkaz, H. Bahadir Sahin, Mustafa Tolga Eren, Ozan Sonmez Huawei Turkey Research and Development Center, Umraniye, Istanbul, Turkey {eray.yildiz, mustafa.tolga.eren}@huawei.com {caglartirkaz, hbahadirsahin, osonmez}@gmail.com
Pseudocode	No	No pseudocode or algorithm blocks found.
Open Source Code	No	We make this test data publicly available 1 so that Turkish morphological disambiguation algorithms can be compared more accurately in the future.
Open Datasets	Yes	For Turkish, we used a semi-automatically disambiguated corpus containing 1M tokens (Y uret and T ure 2006). ... We use SPMRL 2014 dataset (Seddah and Tsarfaty 2014) for German and French.
Dataset Splits	Yes	It provides 90% of all sentences as training set and %10 of rest of the sentences as test set. ... The development sets for each language are randomly separated from the training data and are used to optimize the embedding lengths of morphological features.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are provided.
Software Dependencies	No	HFST tool (Lind en, Silfverberg, and Pirinen 2009) is used to perform morphological analysis in German and French whereas (Oﬂazer 1993) is used for Turkish morphological analysis. No specific version numbers for these tools or other software dependencies are provided.
Experiment Setup	Yes	Thus, in the experiments, we used embedding lengths 50, 20 and 5 for roots, POS tags and the other morphological features respectively. The number of ﬁlters in the ﬁrst and second layers are 30 and 40 respectively. The window length, n, that determines the number of words input to the second layer is set to 5. Training is performed with stochastic gradient descent and Ada Grad (Duchi, Hazan, and Singer 2011) as the optimization algorithms.