reproducibilityindex.ai

Language Model Priming for Cross-Lingual Event Extraction

Authors: Steven Fincke, Shantanu Agarwal, Scott Miller, Elizabeth Boschee10627-10635

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that by enabling the language model to better compensate for the deficits of sparse and noisy training data, our approach improves both trigger and argument detection and classification significantly over the state of the art in a zero-shot cross-lingual setting.
Researcher Affiliation	Academia	University of Southern California Information Sciences Institute
Pseudocode	No	The paper includes architectural diagrams (e.g., Figure 1, 2, 3, 4) but no explicit pseudocode blocks or algorithms.
Open Source Code	No	The paper does not provide a link or explicit statement about releasing its own source code. It mentions using third-party tools like spaCy, UDPipe, and Farasa, and a codebase for data splits from DyGIE++.
Open Datasets	Yes	We report results in two experimental settings, both using the ACE 2005 corpus (English and Arabic)6. 6https://www.ldc.upenn.edu/collaborations/past-projects/ace
Dataset Splits	Yes	Our primary experimental setting uses the standard English document train/dev/test splits for this dataset (Yang and Mitchell 2016) and the Arabic splits proposed by Xu et al. (2021).
Hardware Specification	No	The paper mentions using specific language models like "BERT (Devlin et al. 2019)" and "XLM-RoBERTa (Conneau et al. 2020)", but it does not specify any hardware details (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions several software components like "torchcrf", "spaCy", "UDPipe", and "Farasa", but it does not provide specific version numbers for these dependencies, which are required for full reproducibility. It links to documentation or cites papers for some of these tools, but explicit versions are absent.
Experiment Setup	Yes	All models fine tune all the layers of the language model and only use the output from the final layer. ... All results reported in this paper other than Table 2 use this experimental setting and are the average of five seeds. ... For language models we use the large, cased version of BERT (Devlin et al. 2019) for the monolingual English condition and the large version of XLM-Ro BERTa (Conneau et al. 2020) for cross-lingual or Arabic-only conditions.