Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study

Authors: Jinlan Fu, Pengfei Liu, Qi Zhang7732-7739

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement.
Researcher Affiliation Academia Jinlan Fu, Pengfei Liu, Qi Zhang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University Research Institute of Intelligent and Complex Systems, Fudan University School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, China {fujl16, pfliu14, qz}@fudan.edu.cn
Pseudocode Yes Algorithm 1 Consistency calculation and evaluation for Named Entity Recognition
Open Source Code No We have released the datasets: (Re Co NLL, PLONER) for the future research at our project page: http://pfliu.com/Interpret NER/. The paper explicitly states the release of datasets, but not the source code for the methodology or implementation described in the paper.
Open Datasets Yes We conduct experiments on three benchmark datasets: the Co NLL2003 NER dataset, the WNUT16 dataset, an d Onto Notes 5.0 dataset. The Co NLL2003 NER dataset (Sang and De Meulder 2003) is based on Reuters data (Collobert et al. 2011). WNUT16 dataset is provided by the second shared task at WNUT-2016. The Onto Notes 5.0 dataset (Weischedel et al. 2013)...
Dataset Splits No The paper mentions the use of training, validation, and test sets (e.g., in Algorithm 1, where 'multiple subsets of validation data Dval' are mentioned). However, it does not provide specific details on the split percentages or sample counts for the validation sets across any of the datasets used (CoNLL2003, WNUT16, Onto Notes 5.0, or PLONER).
Hardware Specification No No specific hardware details, such as GPU models, CPU specifications, or memory configurations, used for running the experiments were mentioned in the paper.
Software Dependencies No The paper mentions various models and embeddings used (e.g., LSTM, CRF, BERT, ELMo, FLAIR, GloVe, CNN, MLP), but does not provide specific version numbers for any software dependencies, libraries, or frameworks (e.g., TensorFlow, PyTorch, scikit-learn) required to reproduce the experiments.
Experiment Setup No The paper describes general architectural choices for different experiments (e.g., 'All models adopt LSTM as sentence encoder and CRF as the decoder', 'Cnone Wrandlstm Crf model', 'Ccnn Wglovelstm MLP architecture'), but it does not provide specific hyperparameters such as learning rates, batch sizes, number of training epochs, or optimizer settings for reproducibility.