Neural Models for Sequence Chunking
Authors: Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks. and We conduct experiments on text chunking and semantic slot filling respectively to test the performance of the neural sequence chunking models we propose in this paper. |
| Researcher Affiliation | Industry | Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou IBM Watson 1101 Kitchawan Road, Yorktown Heights, NY 10598 {fzhai,potdars,bingxia,zhou}@us.ibm.com |
| Pseudocode | No | The paper describes models and equations (e.g., Formula 1, 2, 3, 4, 5, 6, 7, 8) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We use the Co NLL 2000 shared task (Tjong Kim Sang and Buchholz 2000) dataset for text chunking. The first one is the ATIS dataset, which consists of reservation requests from the air travel domain. We also use a larger dataset by combining the ATIS corpus with the MIT Restaurant Corpus and MIT Movie Corpus (Liu et al. 2013a; 2013b). |
| Dataset Splits | Yes | For the CoNLL 2000 dataset, 'we hold out 10% of the training data (selected at random) as the validation set.' For ATIS and the LARGE dataset, 'We randomly selected 80% of the training data for model training and the rest 20% as the validation set.' The paper also provides detailed hyperparameter settings and training configurations. |
| Hardware Specification | No | The paper describes the neural network models and training parameters, but it does not specify any details about the hardware (e.g., specific GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'SGD' for training and 'conlleval.pl' for evaluation, but it does not provide specific version numbers for these or any other software libraries (e.g., deep learning frameworks like TensorFlow or PyTorch) used in the experiments. |
| Experiment Setup | Yes | For the two tasks, we use hidden state size as 100 for the forward and backward LSTM respectively in Bi-LSTM, and size 200 for the LSTM decoder. We use dropout with rate 0.5 on both the input and output of all LSTMs. The mini-batch size is set to 1. The number of training epochs are limited to 200 for text chunking, and 100 for slot filling. For the CNN used in Model II and III on extracting chunk features, the filter size is the same as word embedding dimension, and the filter window size as 2. We adopt SGD to train the model, and by grid search, we tune the initial learning rate in [0.01, 0.1], learning rate decay in [1e-6, 1e-4], and context window size {1,3,5}. |