reproducibilityindex.ai

Low-Resource NER by Data Augmentation With Prompting

Authors: Jian Liu, Yufeng Chen, Jinan Xu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results have widely conﬁrmed the effectiveness of our approach.
Researcher Affiliation	Academia	Jian Liu , Yufeng Chen and Jinan Xu Beijing Key Lab of Trafﬁc Data Analysis and Mining Beijing Jiaotong University, School of Computer and Information Technology, China jianliu@bjtu.edu.cn, chenyf@bjtu.edu.cn, jaxu@bjtu.edu.cn
Pseudocode	No	The paper describes its approach using descriptive text and mathematical equations (e.g., equations 1-7) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We have made our code available at https: //github.com/jianliu-ml/few NER for further investigation.
Open Datasets	Yes	We use three NER datasets for evaluation: Co NLL 2003 [Tjong Kim Sang and De Meulder, 2003], Onto Notes 5.0 [Hovy et al., 2006], and a real-world low-resource dataset, Ma Scip [Mysore et al., 2019].
Dataset Splits	Yes	for each dataset we sample out 50, 150, and 500 sentences (at least one mention of each entity type is included) to create the small (S), medium (M), and large (L) training sets (we use F to indicate the full training set), and we use precision (P), recall (R), and F1 as evaluation metrics. Table 2 gives statistics of the three datasets. ... We use the development set to tune the best iteration step.
Hardware Specification	No	The paper mentions using 'BERT-base cased version' as the backbone and discusses model architectures, but does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies	No	The paper mentions using 'BERT-base cased version', 'Bi LSTM-CRF', 'Glo Ve embeddings', and 'Adam' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In label-conditioned word replacement, we empirically set T = 10... In the uncertainty-guided self-training method, we set the number of forward passes K to 10... and select N = 200... We set the batch size to 50... and the learning rate to 1e-2... The batch size is set to 10... and the learning rate is set to 1e-5... We apply Adam [Kingma and Ba, 2015] for model optimization.