Using Frame Semantics for Knowledge Extraction from Twitter

Authors: Anders Søgaard, Barbara Plank, Hector Alonso

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We collect tweets about 60 entities in Freebase and compare four methods to extract binary relation candidates, based on syntactic and semantic parsing and simple mechanism for factuality scoring. The extracted facts are manually evaluated in terms of their correctness and relevance for search.
Researcher Affiliation Academia Anders Søgaard, Barbara Plank, and Hector Martinez Alonso Center for Language Technology, University of Copenhagen, Denmark soegaard@hum.ku.dk
Pseudocode No The paper mentions creating "software for frame semantic annotation of POS tagged text with a web browser interface" but does not include any pseudocode or algorithm blocks within its text.
Open Source Code No The paper mentions third-party tools like MATE-TOOLS and REVERB, and refers to the SEMAFOR system, but it does not provide source code for the authors' own methodology, data processing, or experimental setup. Footnote 3 points to a GitHub repository for an annotated Twitter corpus used in previous studies, which is data, not the authors' source code.
Open Datasets Yes In order to evaluate the quality of frame semantic parsing on Twitter intrinsically, we make a multiply frame-annotated dataset of tweets publicly available. [...] Rather than annotating raw text from scratch, we chose to annotate the development and evaluation splits of an annotated Twitter corpus used in previous studies (Ritter et al. 2011; Derczynski et al. 2013).3 [Footnote 3: https://github.com/aritter/twitter nlp]
Dataset Splits Yes Rather than annotating raw text from scratch, we chose to annotate the development and evaluation splits of an annotated Twitter corpus used in previous studies (Ritter et al. 2011; Derczynski et al. 2013). The splits are those provided by (Derczynski et al. 2013).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies No The paper mentions several software components and tools used (e.g., MATE-TOOLS, SEMAFOR, REVERB, Twitter-adapted POS tagger), but it does not provide specific version numbers for any of these, nor for any programming languages or libraries.
Experiment Setup Yes We select 60 entities in Freebase distributed equally across persons, locations and organizations... and extract 70k tweets. [...] We part of speech (POS) tag these tweets and pass the augmented tweets on to four different extraction models: a syntactic dependency parser, a semantic role labeler, a frame semantic parser, and a rule-based off-the-shelf (REVERB) open information extraction system. For all systems, except REVERB, we apply the same heuristics to filter out relevant facts and rank them in terms of factuality using sentiment analysis. [...] We had three professional annotators (cf. Table 4) annotate the top 100 fact candidates from each system. The facts were rated as INTELIGIBLE, TRUE, OPINIONATED and RELEVANT.