Using Frame Semantics for Knowledge Extraction from Twitter
Authors: Anders Søgaard, Barbara Plank, Hector Alonso
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We collect tweets about 60 entities in Freebase and compare four methods to extract binary relation candidates, based on syntactic and semantic parsing and simple mechanism for factuality scoring. The extracted facts are manually evaluated in terms of their correctness and relevance for search. |
| Researcher Affiliation | Academia | Anders Søgaard, Barbara Plank, and Hector Martinez Alonso Center for Language Technology, University of Copenhagen, Denmark soegaard@hum.ku.dk |
| Pseudocode | No | The paper mentions creating "software for frame semantic annotation of POS tagged text with a web browser interface" but does not include any pseudocode or algorithm blocks within its text. |
| Open Source Code | No | The paper mentions third-party tools like MATE-TOOLS and REVERB, and refers to the SEMAFOR system, but it does not provide source code for the authors' own methodology, data processing, or experimental setup. Footnote 3 points to a GitHub repository for an annotated Twitter corpus used in previous studies, which is data, not the authors' source code. |
| Open Datasets | Yes | In order to evaluate the quality of frame semantic parsing on Twitter intrinsically, we make a multiply frame-annotated dataset of tweets publicly available. [...] Rather than annotating raw text from scratch, we chose to annotate the development and evaluation splits of an annotated Twitter corpus used in previous studies (Ritter et al. 2011; Derczynski et al. 2013).3 [Footnote 3: https://github.com/aritter/twitter nlp] |
| Dataset Splits | Yes | Rather than annotating raw text from scratch, we chose to annotate the development and evaluation splits of an annotated Twitter corpus used in previous studies (Ritter et al. 2011; Derczynski et al. 2013). The splits are those provided by (Derczynski et al. 2013). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments. |
| Software Dependencies | No | The paper mentions several software components and tools used (e.g., MATE-TOOLS, SEMAFOR, REVERB, Twitter-adapted POS tagger), but it does not provide specific version numbers for any of these, nor for any programming languages or libraries. |
| Experiment Setup | Yes | We select 60 entities in Freebase distributed equally across persons, locations and organizations... and extract 70k tweets. [...] We part of speech (POS) tag these tweets and pass the augmented tweets on to four different extraction models: a syntactic dependency parser, a semantic role labeler, a frame semantic parser, and a rule-based off-the-shelf (REVERB) open information extraction system. For all systems, except REVERB, we apply the same heuristics to filter out relevant facts and rank them in terms of factuality using sentiment analysis. [...] We had three professional annotators (cf. Table 4) annotate the top 100 fact candidates from each system. The facts were rated as INTELIGIBLE, TRUE, OPINIONATED and RELEVANT. |