reproducibilityindex.ai

CASIE: Extracting Cybersecurity Event Information from Text

Authors: Taneeya Satyapanich, Francis Ferraro, Tim Finin8749-8757

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted experiments on each component in the event detection pipeline and the results show that each subsystem performs well.
Researcher Affiliation	Academia	Taneeya Satyapanich, Francis Ferraro, Tim Finin Computer Science and Electrical Engineering University of Maryland, Baltimore County Baltimore, MD 21250 USA {taneeya1, ferraro, ﬁnin}@umbc.edu
Pseudocode	No	The paper describes its architecture and various neural network components using textual descriptions and block diagrams, but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	We make our corpus, annotations and code publicly available (Satyapanich 2019a). Satyapanich, T. 2019a. CASIE Repository. https://github.com/Ebiquity/CASIE.
Open Datasets	Yes	We collected about 5,000 cybersecurity news articles (Cyberwire 2019). These news articles were published in 20172019. About 1,000 of them which mention our ﬁve events were annotated by three experienced computer scientists, using majority vote to select the ﬁnal annotations. We make our corpus, annotations and code publicly available (Satyapanich 2019a).
Dataset Splits	Yes	We developed CASIE using 8-fold cross validation of 900 articles of training data set, using 100 articles for testing.
Hardware Specification	No	The paper does not explicitly describe any specific hardware specifications (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions software tools and models like 'Core NLP', 'DBpedia Spotlight', 'Wikidata', 'Word2vec', and 'BERT-Base Uncased model', but does not provide specific version numbers for these or other key software components or libraries.
Experiment Setup	Yes	We kept all of the word embeddings as input and experimentally found that using the fourth-to-last hidden layer gave the best development performance. Early experiments showed that an attention size of ﬁve gave the best performance, which was further improved by using a tanh activation function. The number of nodes of every layer is equal to a half of the number in the previous layer, except for the attention layer (where the output and input sizes are equal) and the CRF layer (where the output size is equal to the number of output classes).