Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study
Authors: Jinlan Fu, Pengfei Liu, Qi Zhang7732-7739
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement. |
| Researcher Affiliation | Academia | Jinlan Fu, Pengfei Liu, Qi Zhang Shanghai Key Laboratory of Intelligent Information Processing, Fudan University Research Institute of Intelligent and Complex Systems, Fudan University School of Computer Science, Fudan University 825 Zhangheng Road, Shanghai, China {fujl16, pfliu14, qz}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Consistency calculation and evaluation for Named Entity Recognition |
| Open Source Code | No | We have released the datasets: (Re Co NLL, PLONER) for the future research at our project page: http://pfliu.com/Interpret NER/. The paper explicitly states the release of datasets, but not the source code for the methodology or implementation described in the paper. |
| Open Datasets | Yes | We conduct experiments on three benchmark datasets: the Co NLL2003 NER dataset, the WNUT16 dataset, an d Onto Notes 5.0 dataset. The Co NLL2003 NER dataset (Sang and De Meulder 2003) is based on Reuters data (Collobert et al. 2011). WNUT16 dataset is provided by the second shared task at WNUT-2016. The Onto Notes 5.0 dataset (Weischedel et al. 2013)... |
| Dataset Splits | No | The paper mentions the use of training, validation, and test sets (e.g., in Algorithm 1, where 'multiple subsets of validation data Dval' are mentioned). However, it does not provide specific details on the split percentages or sample counts for the validation sets across any of the datasets used (CoNLL2003, WNUT16, Onto Notes 5.0, or PLONER). |
| Hardware Specification | No | No specific hardware details, such as GPU models, CPU specifications, or memory configurations, used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions various models and embeddings used (e.g., LSTM, CRF, BERT, ELMo, FLAIR, GloVe, CNN, MLP), but does not provide specific version numbers for any software dependencies, libraries, or frameworks (e.g., TensorFlow, PyTorch, scikit-learn) required to reproduce the experiments. |
| Experiment Setup | No | The paper describes general architectural choices for different experiments (e.g., 'All models adopt LSTM as sentence encoder and CRF as the decoder', 'Cnone Wrandlstm Crf model', 'Ccnn Wglovelstm MLP architecture'), but it does not provide specific hyperparameters such as learning rates, batch sizes, number of training epochs, or optimizer settings for reproducibility. |