Reliable Data Generation and Selection for Low-Resource Relation Extraction

Authors: Junjie Yu, Xing Wang, Wenliang Chen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experimentation on three datasets with low-resource settings, we demonstrate the effectiveness of our proposed approach in constructing annotated data and achieving noteworthy improvements in comparison to multiple baselines.
Researcher Affiliation Collaboration Junjie Yu1, Xing Wang2, Wenliang Chen1* 1School of Computer Science and Technology, Soochow University, Suzhou, China 2Tencent AI Lab, Shenzhen, China jjyu@stu.suda.edu.cn, brightxwang@tencent.com, wlchen@suda.edu.cn
Pseudocode Yes Algorithm 1: Sentence Selection and Training Input: seed train Dseed, triplets T and Generator Mg. Hyper Parameter: number of sentences generated for each triplet K. Output: Selected Data Dsel and Relation Extractor Mre.
Open Source Code Yes Code, data and models are available at https://github.com/jjyunlp/Generation_RE.
Open Datasets Yes To verify our Self-RDGS approach, we conduct experiments on three datasets, including two human-annotated datasets and one DS-annotated dataset. Sem Eval A human-annotated dataset from Sem Eval-2010 Task 8 (Hendrickx et al. 2010)... Re-TACRED A revised version of the human-annotated dataset TACRED (Zhang et al. 2017) proposed by (Stoica, Platanios, and P oczos 2021) . NYT10m An updated version of the widely used DS dataset NYT10 (Riedel, Yao, and Mc Callum 2010)...
Dataset Splits Yes To enhance the realistic of low-resource scenarios, we do not create a separate validation set in our approach. Instead, the seed data serves as the validation set while the automatically generated sentences serves as training data.
Hardware Specification No The paper does not provide specific hardware details (like GPU models, CPU models, or memory) used for running the experiments.
Software Dependencies No The paper mentions models like GPT-2large, LLa Ma2-7B-chat, Chat GLM2-6B, and BERTbase, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Relation Extraction Training We utilize BERTbase (Devlin et al. 2018) to build the RE models. Throughout the training process, we set the learning rate to 5e-5 and maintain a batch size of 32, according to the performance on the validation set. The model is trained for a maximum of 20 epochs, and early stopping is determined by the validation performance.