Learning Task-Specific Representation for Novel Words in Sequence Labeling
Authors: Minlong Peng, Qi Zhang, Xiaoyu Xing, Tao Gui, Jinlan Fu, Xuanjing Huang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate its effectiveness, we performed extensive empirical studies on four partof-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods. |
| Researcher Affiliation | Academia | Minlong Peng , Qi Zhang , Xiaoyu Xing , Tao Gui , Jinlan Fu and Xuanjing Huang School of Computer Science, Fudan University, Shanghai, China {mlpeng16, qz, xyxing18, tgui16, fujl16, xjhuang}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training of the student network |
| Open Source Code | Yes | Source code of this work is available at https:// github.com/v-mipeng/Task OOV. |
| Open Datasets | Yes | For POS, we conducted experiments on: (1) PTBEnglish: the Wall Street Journal portion of the English Penn Treebank dataset [Marcus et al., 1993], (2) RIT-English: a dataset created from Tweets in English [Derczynski et al., 2013], (3) GSD-Russian: the Russian Universal Dependencies Treebank annotated and converted by Google1, and (4) RRT-Romanian: the Romanian UD treebank (called Ro Ref Trees) [Verginica Barbu Mititelu, 2016]. |
| Dataset Splits | Yes | For PTBEnglish, we followed the standard splits: sections 2-21 for training, section 22 for validation, and section 23 for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments, only general setup information. |
| Software Dependencies | No | The paper mentions using Adam optimizer and specific network dimensions but does not provide version numbers for programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | Dimension of word embedding, character embedding, and LSTM were respectively set to 50, 16, and 50 for both the teacher and student networks. Kernel size of the character CNN was set to 25 for kernel width 3 and 5. Optimization was performed using the Adam step rule [Kinga and Adam, 2015] with the learning rate set to 1e-3. |