Morphological Segmentation with Window LSTM Neural Networks

Authors: Linlin Wang, Zhu Cao, Yu Xia, Gerard de Melo

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on multiple languages confirm the effectiveness of our models on this task. We set the experimental parameters such as window size, learning rate, dropout rate, evaluation batch size, validation perplexity, threshold, etc. for the models using the development data. For instance, for 100% Hebrew training, we set 0.25 as the dropout to apply right after the encoder to an LSTM in the Window LSTM architecture. We set the encoder dropout as 0.3 and learning rate to be 0.0005 for Multi-Window LSTMs (MW-LSTM) architecture. And we set the encoder dropout as 0.2, learning rate as 0.00065, gradient clipping threshold at 10, decay rate as 0.5, and momentum as 0.01 for Bidirectional Multi-Window LSTMs (BMW-LSTM).
Researcher Affiliation Academia Linlin Wang and Zhu Cao and Yu Xia and Gerard de Melo Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing 100084, China {ll-wang13, cao-z13, xiay12}@mails.tsinghua.edu.cn, gdm@demelo.org
Pseudocode No The paper describes model architectures using figures and equations but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the source code of the described methodology.
Open Datasets Yes S&B data. This well-known data set by Snyder and Barzilay (2008) was derived from biblical text and consists of Hebrew, Arabic, Aramaic, and English terms together with their frequencies.
Dataset Splits Yes We follow the commonly used method to partition the data into training, development (tuning), and test splits (Ruokolainena et al. 2013). This involves sorting the inputs according to their frequency and assigning every fifth term starting from the first one into the test set and every fifth term from the second into the development set, while leaving the remaining data as training data.
Hardware Specification No The paper does not mention any specific hardware details used for running the experiments.
Software Dependencies No The paper mentions techniques and models (e.g., LSTM, Softmax) but does not provide specific software dependencies like library names with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x).
Experiment Setup Yes We set the experimental parameters such as window size, learning rate, dropout rate, evaluation batch size, validation perplexity, threshold, etc. for the models using the development data. For instance, for 100% Hebrew training, we set 0.25 as the dropout to apply right after the encoder to an LSTM in the Window LSTM architecture. We set the encoder dropout as 0.3 and learning rate to be 0.0005 for Multi-Window LSTMs (MW-LSTM) architecture. And we set the encoder dropout as 0.2, learning rate as 0.00065, gradient clipping threshold at 10, decay rate as 0.5, and momentum as 0.01 for Bidirectional Multi-Window LSTMs (BMW-LSTM). Window sizes were chosen from {3,5,7}.