Uncertainty-Aware Self-Training for Low-Resource Neural Sequence Labeling
Authors: Jianing Wang, Chengyu Wang, Jun Huang, Ming Gao, Aoying Zhou
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over six benchmarks demonstrate that our Seq UST framework effectively improves the performance of self-training, and consistently outperforms strong baselines by a large margin in low-resource scenarios. |
| Researcher Affiliation | Collaboration | 1 School of Data Science and Engineering, East China Normal University, Shanghai, China 2 Alibaba Group, Hangzhou, China 3 KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China |
| Pseudocode | Yes | Algorithm 1: Self-training Procedure of Seq UST |
| Open Source Code | No | The paper does not explicitly state that the source code is released or provide a link to a code repository. |
| Open Datasets | Yes | We choose six widely used benchmarks to evaluate our Seq UST framework, including SNIPS (Coucke et al. 2018) and Multiwoz (Budzianowski et al. 2018) for slot filing, MIT Movie (Liu et al. 2013b), MIT Restaurant (Liu et al. 2013a), Co NLL-03 (Sang and Meulder 2003) and Onto Notes (Weischedel et al. 2013) for NER. |
| Dataset Splits | Yes | For each dataset, we use a greedy-based sampling strategy to randomly select 10-shot labeled data per class for the few-shot labeled training set and validation set, while the remaining data are viewed as unlabeled data. |
| Hardware Specification | Yes | We implement our framework in Pytorch 1.8 and use NVIDIA V100 GPUs for experiments. |
| Software Dependencies | Yes | We implement our framework in Pytorch 1.8 and use NVIDIA V100 GPUs for experiments. |
| Experiment Setup | Yes | For each dataset, we use a greedy-based sampling strategy to randomly select 10-shot labeled data per class for the few-shot labeled training set and validation set, while the remaining data are viewed as unlabeled data. During self-training, the teacher and student model share the same model architecture. In default, we choose BERT-base-uncased (Devlin et al. 2019) from Hugging Face with a softmax layer as the base encoder. We use grid search to search the hyper-parameters. We select five different random seeds for the dataset split and training settings among {12, 21, 42, 87, 100}. |