Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Kernel Density Estimation for Text-Based Geolocation

Authors: Mans Hulden, Miikka Silfverberg, Jerid Francom

AAAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content. The main results of our experiments of the test set of GEOTEXT are given in table 1.
Researcher Affiliation	Academia	Mans Hulden University of Colorado Boulder EMAIL Miikka Silfverberg University of Helsinki miikka.silfverberg@helsinki.ﬁ Jerid Francom Wake Forest University EMAIL
Pseudocode	No	The paper describes methods using mathematical formulas but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The program code and relevant instructions for running all experiments are available at our website.4 We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset.4http://geoloc-kde.googlecode.com
Open Datasets	Yes	For our ﬁrst experiments, we have used the GEOTEXT geotagged corpus... it has the advantage of public availability.3 And, We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset.
Dataset Splits	Yes	We use the training/test/dev splits that come with the dataset and are used elsewhere, yielding 5,685 documents in the training set and 1,895 documents in the development and test sets. We held out 10,000 tweets for development and 10,000 for testing.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not specify particular software dependencies (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup	Yes	We tune the following parameters for the density estimation method: (1) the standard deviation of the two-dimensional Gaussian: σ, (2) the vocabulary threshold h, (3) the prior β for words. The document/cell prior α is ﬁxed at 1. A coarse grid search over σ, β, and h (threshold) was used to ﬁx σ, after which a ﬁner-grained 3d grid search was used to tune β , h (0-20), and the grid size in degrees (0.5,1,2,5,10).