reproducibilityindex.ai

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

Authors: Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, Tat-Seng Chua

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying.
Researcher Affiliation	Academia	1 Sea-NEx T Joint Lab, School of Computing, National University of Singapore 2 Wuhan University 3 Harbin Institute of Technology (Shenzhen)
Pseudocode	No	The paper describes its methods and processes using textual descriptions and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Our resources can be found at https://github.com/Choco Wu/Las UIE.
Open Datasets	Yes	We use the plain texts from Wikipedia2 and Books Corpus3 corpora for the post-training. To cover all three UIE prototypes, we consider 7 representative IE tasks with corresponding data: 1) NER: Co NLL03 [66], Onto Note [53], ACE04 [11], ACE05 [22]; 2) RE: Co NLL04 [57], NYT [56], ACE05 [22]; 3) AOP: Res14 [51]; 4) ASTE: Res14 [51]; 5) ORL: MPQA [74]; 6) SRL: Co NLL12 [53]; 7) EE: ACE05 [22]. Our resources can be found at https://github.com/Choco Wu/Las UIE. 2https://autonlp.ai/datasets/wikipedia-news-corpus 3https://huggingface.co/datasets/bookcorpus
Dataset Splits	Yes	Each dataset has its own split, and we follow the same practice of the relevant prior works when using it.
Hardware Specification	No	The paper mentions 'GPU-based training' in the 'Potential impact and limitations' section but does not specify any particular GPU models, CPU types, or detailed hardware configurations used for the experiments. It only gives a general type of hardware.
Software Dependencies	No	The paper states 'We take the pre-trained T5 Base as default backbone GLM' and mentions using 'BART [32] and T5 [55]'. However, it does not specify version numbers for T5, BART, or any other software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	No	The paper describes the model architecture and training stages, but it does not provide specific experimental setup details such as learning rates, batch sizes, number of epochs, optimizer settings, or other hyperparameter values in the main text.