Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Towards Domain-Specific Semantic Relatedness: A Case Study from Geography

Authors: Shilad Sen, Isaac Johnson, Rebecca Harper, Huy Mai, Samuel Horlbeck Olsen, Benjamin Mathers, Laura Souza Vonessen, Matthew Wright, Brent Hecht

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments reported below, we find evidence that domain-specific approaches to SR can be remarkably effective. Specifically, we show that a domain-specific geography-enhanced SR measure (GESR) that intelligently extends general SR with geography-specific signals (e.g. distance, containment) significantly outperforms the stateof-the-art in general SR for within-domain SR assessment (Spearman s correlation of 0.810 vs. 0.656).
Researcher Affiliation Academia Shilad Sen Macalester College EMAIL Isaac Johnson University of Minnesota EMAIL Rebecca Harper Wilamette College EMAIL Huy Mai Brandeis University EMAIL Samuel Horlbeck Olsen Macalester College EMAIL Benjamin Mathers Macalester College EMAIL Laura Souza Vonessen University of Arizona EMAIL Matthew Wright University of Minnesota EMAIL Brent Hecht University of Minnesota EMAIL
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code Yes This is the first dataset of its kind, and we are releasing it along with a reference implementation of GESR to advance both geography-specific SR research and domain-specific SR research more generally.1 1 https://github.com/shilad/geo-sr
Open Datasets Yes Our findings are enabled by a novel gold standard dataset of relatedness estimates for pairs of places (i.e. geographic concepts) that we collected for this paper. ... This is the first dataset of its kind, and we are releasing it along with a reference implementation of GESR to advance both geography-specific SR research and domain-specific SR research more generally.1 1 https://github.com/shilad/geo-sr
Dataset Splits Yes We used gradient-boosted trees [Ganjisaffar, Caruana, and Lopes 2011] as implemented in the scikit-learn machine learning library with seven-fold cross-validation.
Hardware Specification No No specific hardware (e.g., CPU, GPU models, memory size, or cloud instance types) used for running the experiments was mentioned in the paper.
Software Dependencies No The paper mentions the use of "scikit-learn machine learning library" and "Sen et al.'s Wiki Brain toolkit" but does not provide specific version numbers for these software components, which are necessary for full reproducibility.
Experiment Setup Yes We used gradient-boosted trees [Ganjisaffar, Caruana, and Lopes 2011] as implemented in the scikit-learn machine learning library with seven-fold cross-validation. ... We log transformed the four distance metrics (arc, ordinal, countries, states) because they exhibited right-skewed distributions. All features and metrics exhibited 100% coverage for the 754 concept pairs except for countries-between (96.8% coverage) and states-between (94.4%) due to the nature of continents and the oceans that surround them. For missing data, we impute the maximum values for each distance metric.