Reliable Data Generation and Selection for Low-Resource Relation Extraction
Authors: Junjie Yu, Xing Wang, Wenliang Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experimentation on three datasets with low-resource settings, we demonstrate the effectiveness of our proposed approach in constructing annotated data and achieving noteworthy improvements in comparison to multiple baselines. |
| Researcher Affiliation | Collaboration | Junjie Yu1, Xing Wang2, Wenliang Chen1* 1School of Computer Science and Technology, Soochow University, Suzhou, China 2Tencent AI Lab, Shenzhen, China jjyu@stu.suda.edu.cn, brightxwang@tencent.com, wlchen@suda.edu.cn |
| Pseudocode | Yes | Algorithm 1: Sentence Selection and Training Input: seed train Dseed, triplets T and Generator Mg. Hyper Parameter: number of sentences generated for each triplet K. Output: Selected Data Dsel and Relation Extractor Mre. |
| Open Source Code | Yes | Code, data and models are available at https://github.com/jjyunlp/Generation_RE. |
| Open Datasets | Yes | To verify our Self-RDGS approach, we conduct experiments on three datasets, including two human-annotated datasets and one DS-annotated dataset. Sem Eval A human-annotated dataset from Sem Eval-2010 Task 8 (Hendrickx et al. 2010)... Re-TACRED A revised version of the human-annotated dataset TACRED (Zhang et al. 2017) proposed by (Stoica, Platanios, and P oczos 2021) . NYT10m An updated version of the widely used DS dataset NYT10 (Riedel, Yao, and Mc Callum 2010)... |
| Dataset Splits | Yes | To enhance the realistic of low-resource scenarios, we do not create a separate validation set in our approach. Instead, the seed data serves as the validation set while the automatically generated sentences serves as training data. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU models, CPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions models like GPT-2large, LLa Ma2-7B-chat, Chat GLM2-6B, and BERTbase, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Relation Extraction Training We utilize BERTbase (Devlin et al. 2018) to build the RE models. Throughout the training process, we set the learning rate to 5e-5 and maintain a batch size of 32, according to the performance on the validation set. The model is trained for a maximum of 20 epochs, and early stopping is determined by the validation performance. |