reproducibilityindex.ai

Scale Up Event Extraction Learning via Automatic Training Data Generation

Authors: Ying Zeng, Yansong Feng, Rong Ma, Zheng Wang, Rui Yan, Chongde Shi, Dongyan Zhao

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach by using the knowledge extracted from Freebase to label texts from Wikipedia articles. Experimental results show that our approach can generate a large number of high-quality training instances.
Researcher Affiliation	Academia	1 Institute of Computer Science and Technology, Peking University, P.R. China, 2 School of Computing and Communications, Lancaster University, UK, 3 Institute of Scientiﬁc and Technical Information of China
Pseudocode	No	The paper does not contain any blocks explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	No	The paper does not provide a statement or link for the open-source code of the described methodology.
Open Datasets	Yes	On ACE: We also test our strategy on the ACE dataset. We ﬁrst collect all annotated events, without triggers, as the knowledge base to compute the importance values for all arguments, and select the key arguments for each ACE event type accordingly. We follow IMP&TIME+DIS to examine every sentence whether it can be selected as an annotated instance within the ACE event types. Eventually, we correctly obtain 3,448 sentences as positive instances, covering 64.7% of the original ACE dataset. (Doddington et al. 2004)
Dataset Splits	Yes	Our final dataset, FBWiki, using IMP&TIME+DIS , contains 46,735 positive sentences and 79,536 negative ones5 a random split of 101,019 for training and 25,252 for testing. All hyper-parameters are tuned on a development split in the training set.
Hardware Specification	No	The paper discusses model parameters and training configurations but does not provide specific hardware details (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions software like Gurobi (Gurobi Optimization 2016), CRF++ toolkit (Kudo 2005), and skip-gram word2vec (Mikolov et al. 2013), but does not provide specific version numbers for these software components.
Experiment Setup	Yes	During event detection, we set the size of word embeddings to 200, the size of LSTM layer to 100. In argument detection, we use the same size of word embedding, while the size of LSTM layer is 150, and the size of key argument embedding is 50. Word embeddings are pre-trained using skip-gram word2vec (Mikolov et al. 2013) on English Wikipedia and ﬁne tuned during training. We apply dropout (0.5) on both input and output layers.