reproducibilityindex.ai

GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction

Authors: Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluation empirically demonstrates that Go LLIE is able to generalize to and follow unseen guidelines, outperforming previous attempts at zero-shot information extraction. The ablation study shows that detailed guidelines are key for good results.
Researcher Affiliation	Academia	Oscar Sainz , Iker Garc ıa-Ferrero Rodrigo Agerri, Oier Lopez de Lacalle, German Rigau, Eneko Agirre Hi TZ Basque Center for Language Technology Ixa NLP Group University of the Basque Country (UPV/EHU) {oscar.sainz, iker.garciaf}@ehu.eus
Pseudocode	No	The paper includes Python code examples for input/output representation (e.g., Figure 2, 3, 5, 6), but these are examples of data format and not presented as structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code, data, and models are publicly available: https://github.com/hitz-zentroa/Go LLIE.
Open Datasets	Yes	Table 1: Datasets used on the experiments. The table shows the domain, tasks and whether are use for training, evaluation or both. ACE05 (Walker et al., 2006) News ... Co NLL 2003 (Tjong Kim Sang & De Meulder, 2003) News
Dataset Splits	Yes	Regarding the splits, we use the standard train, dev test splits for every dataset. In the case of ACE, we follow the split provided by Lin et al. (2020). In the case of CASIE, we took the first 200 instances as validation and the last 2000 as test.
Hardware Specification	Yes	Our training infrastructure was 2 NVIDIA s A100 with 80gb each. ... QLo RA approach was trained using just one Nvidia A100 80GB GPU thanks to the 4-bit quantization of the frozen model Dettmers et al. (2023). Training the full model required a minimum of four Nvidia A100 80GB GPUs to fit the model into memory.
Software Dependencies	No	The paper mentions using "QLo RA" and "Deep Speed" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The models were trained for 3 epochs with an effective batch size of 32 and a learning rate of 3e-4 with a cosine scheduler.