Kernel Density Estimation for Text-Based Geolocation

Authors: Mans Hulden, Miikka Silfverberg, Jerid Francom

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content. The main results of our experiments of the test set of GEOTEXT are given in table 1.
Researcher Affiliation Academia Mans Hulden University of Colorado Boulder mans.hulden@colorado.edu Miikka Silfverberg University of Helsinki miikka.silfverberg@helsinki.fi Jerid Francom Wake Forest University francojc@wfu.edu
Pseudocode No The paper describes methods using mathematical formulas but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The program code and relevant instructions for running all experiments are available at our website.4 We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset.4http://geoloc-kde.googlecode.com
Open Datasets Yes For our first experiments, we have used the GEOTEXT geotagged corpus... it has the advantage of public availability.3 And, We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset.
Dataset Splits Yes We use the training/test/dev splits that come with the dataset and are used elsewhere, yielding 5,685 documents in the training set and 1,895 documents in the development and test sets. We held out 10,000 tweets for development and 10,000 for testing.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory specifications) used for running the experiments.
Software Dependencies No The paper does not specify particular software dependencies (e.g., library names with version numbers) needed to replicate the experiments.
Experiment Setup Yes We tune the following parameters for the density estimation method: (1) the standard deviation of the two-dimensional Gaussian: σ, (2) the vocabulary threshold h, (3) the prior β for words. The document/cell prior α is fixed at 1. A coarse grid search over σ, β, and h (threshold) was used to fix σ, after which a finer-grained 3d grid search was used to tune β , h (0-20), and the grid size in degrees (0.5,1,2,5,10).