Switch-LSTMs for Multi-Criteria Chinese Word Segmentation
Authors: Jingjing Gong, Xinchi Chen, Tao Gui, Xipeng Qiu6457-6464
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our model obtains significant improvements on eight corpora with heterogeneous segmentation criteria, compared to the previous method and single-criterion learning. Table 3: Results of the proposed model on the test sets of eight CWS datasets. |
| Researcher Affiliation | Academia | Jingjing Gong, Xinchi Chen, Tao Gui, Xipeng Qiu Shanghai Key Laboratory of Intelligent Information Processing, Fudan University School of Computer Science, Fudan University Shanghai Institute of Intelligent Electronics and Systems {jjgong, xinchichen13, tgui16, xpqiu}@fudan.edu.cn |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Datasets We experiment on eight CWS datasets from SIGHAN2005 (Emerson 2005) and SIGHAN2008 (Jin and Chen 2008). The three commonly-used corpora, PKU s People s Daily (PKU) (Yu et al. 2001), Penn Chinese Treebank (CTB) (Fei 2000) and MSRA (Emerson 2005), use different segmentation criteria. |
| Dataset Splits | Yes | We randomly pick 10% instances from training set as the development set for all the datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'word2vec toolkit (Mikolov et al. 2013)' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The character embedding size de is set to 100, task embedding size is set to 20, the hidden size dh for our proposed Switch-LSTMs are set to 100, the number of choices K in K-way switch is set to one of {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. As a common approach to alleviate overfitting, we dropout our embedding with a probability of 0.2. Other than embedding, we use Xavier uniform initializer for all trainable parameters in our model. For each training step, we sample 6 tasks from the task pool, each with a batch size of 128... The training process is terminated after 7 epochs without improvement on development set. |