Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-Grained Student Ensemble

Authors: Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the effectiveness of our proposed method, we conduct experiments on four DS-NER datasets. The experimental results demonstrate that our method significantly surpasses previous SOTA methods.
Researcher Affiliation Collaboration 1Huawei Cloud 2School of Software Engineering, Huazhong University of Science and Technology 3Peking University 4Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology
Pseudocode Yes Algorithm 1: ATSEN training.
Open Source Code Yes The code is available at https://github.com/zenhjunpro/ATSEN.
Open Datasets Yes Co NLL03 (Sang and De Meulder 2003) consists of 1393 English news articles and is annotated with four entity types: person, location, organization, and miscellaneous. Onto Notes 5.0 (Weischedel et al. 2013) contains documents from multiple domains, including broadcast conversation, P2.5 data, and Web data. Webpage (Ratinov and Roth 2009) comprises of personal, academic, and computer science conference webpages. Twitter (Godin et al. 2015) is from the WNUT 2016 NER shared task.
Dataset Splits Yes Table 1: The statistics of four DS-NER datasets. Dataset Train Dev Test Co NLL03 Sentence 14041 3250 3453 Token 203621 51362 46435
Hardware Specification No No specific hardware details (like GPU/CPU models, processors, or memory) were mentioned for running experiments.
Software Dependencies No The paper mentions software like RoBERTa, DistilRoBERTa, and LIBSVM but does not provide specific version numbers for these dependencies.
Experiment Setup Yes The max training epoch is 50 for all datasets. The training batch size is 16 for Co NLL03, Webpage, and Twitter and 32 for Onto Notes 5.0. The learning rate is set to 1e-5 for Co NLL03 and Webpage, and 2e-5 for Onto Notes 5.0 and Twitter. For the pretraining stage with noisy labels, we separately train 1, 2, 12, and 6 epochs for Co NLL03, Onto Notes 5.0, Webpage, and Twitter datasets. For adaptive teacher learning, the confidence threshold σ1 is 0.9 for all datasets. In the fine-grained student ensemble, m are 0.995, 0.995, 0.99, 0.995 and σ2 is set to 0.8, 0.995, 0.8, and 0.75 for dataset Co NLL03, Onto Notes 5.0, Webpage, and Twitter, respectively.