Kernel Density Estimation for Text-Based Geolocation
Authors: Mans Hulden, Miikka Silfverberg, Jerid Francom
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content. The main results of our experiments of the test set of GEOTEXT are given in table 1. |
| Researcher Affiliation | Academia | Mans Hulden University of Colorado Boulder mans.hulden@colorado.edu Miikka Silfverberg University of Helsinki miikka.silfverberg@helsinki.fi Jerid Francom Wake Forest University francojc@wfu.edu |
| Pseudocode | No | The paper describes methods using mathematical formulas but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The program code and relevant instructions for running all experiments are available at our website.4 We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset.4http://geoloc-kde.googlecode.com |
| Open Datasets | Yes | For our first experiments, we have used the GEOTEXT geotagged corpus... it has the advantage of public availability.3 And, We release the main program, GEOLOC, as a stand-alone utility for geolocating arbitrary documents using the methods described in this paper, and also the WORLDTWEETS dataset. |
| Dataset Splits | Yes | We use the training/test/dev splits that come with the dataset and are used elsewhere, yielding 5,685 documents in the training set and 1,895 documents in the development and test sets. We held out 10,000 tweets for development and 10,000 for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not specify particular software dependencies (e.g., library names with version numbers) needed to replicate the experiments. |
| Experiment Setup | Yes | We tune the following parameters for the density estimation method: (1) the standard deviation of the two-dimensional Gaussian: σ, (2) the vocabulary threshold h, (3) the prior β for words. The document/cell prior α is fixed at 1. A coarse grid search over σ, β, and h (threshold) was used to fix σ, after which a finer-grained 3d grid search was used to tune β , h (0-20), and the grid size in degrees (0.5,1,2,5,10). |