ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies

Authors: Luis Espinosa-Anke, Horacio Saggion, Francesco Ronzano, Roberto Navigli

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental EXTASEM! achieves state-of-the-art results in the following taxonomy evaluation experiments: (1) Hypernym discovery, (2) Reconstructing gold standard taxonomies, and (3) Taxonomy quality according to structural measures.
Researcher Affiliation Academia Luis Espinosa-Anke*, Horacio Saggion*, Francesco Ronzano* and Roberto Navigli * DTIC TALN Research Group, Universitat Pompeu Fabra, Carrer T anger 122-134, 08018 Barcelona (Spain) Department of Computer Science, Sapienza University of Rome, Viale Regina Elena, 295, Rome (Italy) *{luis.espinosa, horacio.saggion, francesco.ronzano}@upf.edu, navigli@di.uniroma1.it
Pseudocode Yes Algorithm 1 Taxonomy Induction Input: Threshold θ, weighted paths P ϕ W Output: Disambiguated domain taxonomy Tϕ /*A(term, Tϕ) denotes the set of ancestors of term in Tϕ */ Tϕ = for ρϕ τ P ϕ W do if w(ρϕ τ ) > θ then for (term, hyp) ρϕ τ do if hyp / A(term, Tϕ) then Tϕ = Tϕ {term hyp} return Tϕ
Open Source Code No The paper states 'Taxonomies available at http://taln.upf.edu/extasem.' but does not explicitly state that the source code for the methodology described in the paper is released or available.
Open Datasets Yes We evaluated on Semeval-2015 Task 17 (TEx Eval) domains... For each domain, two terminologies and their corresponding gold standard taxonomies were available. Such gold standards came from both domain-specific sources (e.g. for chem., the Ch EBI taxonomy7) and the Word Net subgraph rooted at the domain concept (e.g. the Word Net subtree rooted at chemical in the case of chem.)... We also evaluated a baseline method based on substring inclusion consisting in creating a hyponym hypernym pair between two terms if one is prefix or suffix substring of the other... We trained a model with the WCL corpus, a manually validated dataset (Navigli, Velardi, and Ruiz-Mart ınez 2010).
Dataset Splits No The paper describes using several gold standard taxonomies for evaluation (e.g., TEx Eval 2015, expert-created 100-term samples) and states that models were 'trained on manually-validated data' (WCL corpus), but it does not provide specific details on train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splitting methodology) for reproduction.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications) used to run its experiments.
Software Dependencies No The paper mentions 'CRF++' and 'a dependency parser (Bohnet 2010)' but does not provide specific version numbers for these or any other key software components.
Experiment Setup Yes In all the experiments reported in this paper, we set κ equal to 10... We trained a Conditional Random Fields (Lafferty, Mc Callum, and Pereira 2001) model4 with a word-level context window of [3, -3]... We empirically set a threshold θ to .135, and apply it over all domains.