Uncertainty-Aware Self-Training for Low-Resource Neural Sequence Labeling

Authors: Jianing Wang, Chengyu Wang, Jun Huang, Ming Gao, Aoying Zhou

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments over six benchmarks demonstrate that our Seq UST framework effectively improves the performance of self-training, and consistently outperforms strong baselines by a large margin in low-resource scenarios.
Researcher Affiliation Collaboration 1 School of Data Science and Engineering, East China Normal University, Shanghai, China 2 Alibaba Group, Hangzhou, China 3 KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
Pseudocode Yes Algorithm 1: Self-training Procedure of Seq UST
Open Source Code No The paper does not explicitly state that the source code is released or provide a link to a code repository.
Open Datasets Yes We choose six widely used benchmarks to evaluate our Seq UST framework, including SNIPS (Coucke et al. 2018) and Multiwoz (Budzianowski et al. 2018) for slot filing, MIT Movie (Liu et al. 2013b), MIT Restaurant (Liu et al. 2013a), Co NLL-03 (Sang and Meulder 2003) and Onto Notes (Weischedel et al. 2013) for NER.
Dataset Splits Yes For each dataset, we use a greedy-based sampling strategy to randomly select 10-shot labeled data per class for the few-shot labeled training set and validation set, while the remaining data are viewed as unlabeled data.
Hardware Specification Yes We implement our framework in Pytorch 1.8 and use NVIDIA V100 GPUs for experiments.
Software Dependencies Yes We implement our framework in Pytorch 1.8 and use NVIDIA V100 GPUs for experiments.
Experiment Setup Yes For each dataset, we use a greedy-based sampling strategy to randomly select 10-shot labeled data per class for the few-shot labeled training set and validation set, while the remaining data are viewed as unlabeled data. During self-training, the teacher and student model share the same model architecture. In default, we choose BERT-base-uncased (Devlin et al. 2019) from Hugging Face with a softmax layer as the base encoder. We use grid search to search the hyper-parameters. We select five different random seeds for the dataset split and training settings among {12, 21, 42, 87, 100}.