AUC Maximization for Low-Resource Named Entity Recognition
Authors: Ngoc Dang Nguyen, Wei Tan, Lan Du, Wray Buntine, RIchard Beare, Changyou Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct extensive experiments to demonstrate the advantages of our method under the low-resource and highly-imbalanced data distribution settings. |
| Researcher Affiliation | Academia | Ngoc Dang Nguyen 1, Wei Tan 1, Lan Du 1*, Wray Buntine 2, Richard Beare 1, Changyou Chen 3 1Department of Data Science and Artificial Intelligence, Monash University 2College of Engineering and Computer Science, Vin University 3Department of Computer Science and Engineering, University at Buffalo |
| Pseudocode | Yes | Algorithm 1: NER two-task prediction Input: Stest, w en, w be Output: ˆy1, . . . , ˆyn-test, s.t., ˆyi {B, I, O}l |
| Open Source Code | Yes | The code of this work is available at https://github.com/dngu0061/NER-AUC-2T. |
| Open Datasets | Yes | Both Co NLL 2003 (Tjong Kim Sang and De Meulder 2003) and Onto Notes5 (Weischedel et al. 2014) are standard corpora from the general domain to evaluate and benchmark the NER performance. Whereas NCBI (Do gan, Leaman, and Lu 2014), LINNAEUS (Gerner, Nenadic, and Bergman 2010), and s800 (Pafilis et al. 2013) are standard NER corpora used for biomedical named entity recognition. |
| Dataset Splits | Yes | Table 1: The size of the benchmark NER/bio NER corpora as well as the label distribution for each corpus. In this work, we use the well-defined BIO tagging scheme, i.e., B-Begin, I-Inside, and O-Outside of NEs. Dataset # sentences # tokens % label (B/I/O) Train Dev Test Train Dev Test Train Dev Test Co NLL 2003 (2003) 14,987 3,466 3,684 203,621 51,362 46,435... |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions software components like `libauc` and pre-trained models but does not provide specific version numbers for any key software dependencies. |
| Experiment Setup | Yes | For the rest of the paper, we present the results for AUC-2T and COMAUC-2T with λ = 100 during training. |