Learning Multi-Level Dependencies for Robust Word Recognition

Authors: Zhiwei Wang, Hui Liu, Jiliang Tang, Songfan Yang, Gale Yan Huang, Zitao Liu9250-9257

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to verify the effectiveness of the framework. The results show that the proposed framework outperforms state-of-the-art methods by a large margin and they also suggest that character-level dependencies can play an important role in word recognition.
Researcher Affiliation Collaboration Zhiwei Wang,1 Hui Liu,1 Jiliang Tang,1 Songfan Yang,2 Gale Yan Huang,2 Zitao Liu2 1Michigan State University, {wangzh65, liuhui7, tangjili}@msu.edu 2TAI AI Lab, TAL Education Group, {yangsongfan, galehuang, liuzitao}@100tal.com
Pseudocode No The paper describes the model architecture and training procedures in text but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes The code of the proposed framework and the major experiments are publicly available1. 1https://github.com/DSE-MSU/MUDE
Open Datasets Yes We use the publicly available Penn Treebank (Marcus, Santorini, and Marcinkiewicz 1993) as the dataset.
Dataset Splits Yes We use the same training, validation and testing split in (Sakaguchi et al. 2017), which contains 39,832, 1,700 and 2,416 sentences, respectively.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper mentions "Pytorch" but does not specify any version numbers for PyTorch or other software dependencies, which is required for reproducibility.
Experiment Setup Yes The number of hidden units of word representations is set to be 650 as suggested by previous work (Sakaguchi et al. 2017). The learning rate is chosen from {0.1, 0.01, 0.001, 0.0001} and β in Eq (11) is chosen from {1, 0.1, 0.001} according to the model performance on the validation datasets. The parameters of MUDE are learned with stochastic gradient decent algorithm and we choose RMSprop (Tieleman and Hinton 2012) to be the optimizer as it did in (Sakaguchi et al. 2017).