Towards Domain-Specific Semantic Relatedness: A Case Study from Geography

Authors: Shilad Sen, Isaac Johnson, Rebecca Harper, Huy Mai, Samuel Horlbeck Olsen, Benjamin Mathers, Laura Souza Vonessen, Matthew Wright, Brent Hecht

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments reported below, we find evidence that domain-specific approaches to SR can be remarkably effective. Specifically, we show that a domain-specific geography-enhanced SR measure (GESR) that intelligently extends general SR with geography-specific signals (e.g. distance, containment) significantly outperforms the stateof-the-art in general SR for within-domain SR assessment (Spearman s correlation of 0.810 vs. 0.656).
Researcher Affiliation Academia Shilad Sen Macalester College ssen@macalester.edu Isaac Johnson University of Minnesota joh12041@umn.edu Rebecca Harper Wilamette College rcharper@willamette.edu Huy Mai Brandeis University huymai@brandeis.edu Samuel Horlbeck Olsen Macalester College shorlbec@macalester.edu Benjamin Mathers Macalester College bmathers@macalester.edu Laura Souza Vonessen University of Arizona lvonessen@email.arizona.edu Matthew Wright University of Minnesota mlwright84@gmail.com Brent Hecht University of Minnesota bhecht@cs.umn.edu
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes This is the first dataset of its kind, and we are releasing it along with a reference implementation of GESR to advance both geography-specific SR research and domain-specific SR research more generally.1 1 https://github.com/shilad/geo-sr
Open Datasets Yes Our findings are enabled by a novel gold standard dataset of relatedness estimates for pairs of places (i.e. geographic concepts) that we collected for this paper. ... This is the first dataset of its kind, and we are releasing it along with a reference implementation of GESR to advance both geography-specific SR research and domain-specific SR research more generally.1 1 https://github.com/shilad/geo-sr
Dataset Splits Yes We used gradient-boosted trees [Ganjisaffar, Caruana, and Lopes 2011] as implemented in the scikit-learn machine learning library with seven-fold cross-validation.
Hardware Specification No No specific hardware (e.g., CPU, GPU models, memory size, or cloud instance types) used for running the experiments was mentioned in the paper.
Software Dependencies No The paper mentions the use of "scikit-learn machine learning library" and "Sen et al.'s Wiki Brain toolkit" but does not provide specific version numbers for these software components, which are necessary for full reproducibility.
Experiment Setup Yes We used gradient-boosted trees [Ganjisaffar, Caruana, and Lopes 2011] as implemented in the scikit-learn machine learning library with seven-fold cross-validation. ... We log transformed the four distance metrics (arc, ordinal, countries, states) because they exhibited right-skewed distributions. All features and metrics exhibited 100% coverage for the 754 concept pairs except for countries-between (96.8% coverage) and states-between (94.4%) due to the nature of continents and the oceans that surround them. For missing data, we impute the maximum values for each distance metric.