Learning Task-Specific Representation for Novel Words in Sequence Labeling

Authors: Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Jinlan Fu, Xuanjing Huang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate its effectiveness, we performed extensive empirical studies on four partof-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.
Researcher Affiliation Academia Minlong Peng , Qi Zhang , Xiaoyu Xing , Tao Gui , Jinlan Fu and Xuanjing Huang School of Computer Science, Fudan University, Shanghai, China {mlpeng16, qz, xyxing18, tgui16, fujl16, xjhuang}@fudan.edu.cn
Pseudocode Yes Algorithm 1 Training of the student network
Open Source Code Yes Source code of this work is available at https:// github.com/v-mipeng/Task OOV.
Open Datasets Yes For POS, we conducted experiments on: (1) PTBEnglish: the Wall Street Journal portion of the English Penn Treebank dataset [Marcus et al., 1993], (2) RIT-English: a dataset created from Tweets in English [Derczynski et al., 2013], (3) GSD-Russian: the Russian Universal Dependencies Treebank annotated and converted by Google1, and (4) RRT-Romanian: the Romanian UD treebank (called Ro Ref Trees) [Verginica Barbu Mititelu, 2016].
Dataset Splits Yes For PTBEnglish, we followed the standard splits: sections 2-21 for training, section 22 for validation, and section 23 for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general setup information.
Software Dependencies No The paper mentions using Adam optimizer and specific network dimensions but does not provide version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes Dimension of word embedding, character embedding, and LSTM were respectively set to 50, 16, and 50 for both the teacher and student networks. Kernel size of the character CNN was set to 25 for kernel width 3 and 5. Optimization was performed using the Adam step rule [Kinga and Adam, 2015] with the learning rate set to 1e-3.