reproducibilityindex.ai

GeoLLM: Extracting Geospatial Knowledge from Large Language Models

Authors: Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David B. Lobell, Stefano Ermon

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Across these tasks, our method demonstrates a 70% improvement in performance (measured using Pearson s r2) relative to baselines that use nearest neighbors or use information directly from the prompt, and performance equal to or exceeding satellite-based benchmarks in the literature.
Researcher Affiliation	Academia	Corresponding author, rohinm@cs.stanford.edu Stanford University University of Georgia
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code	Yes	Code is available on the project website: https://rohinmanvi.github.io/Geo LLM
Open Datasets	Yes	World Pop (Tatem, 2017)... World Pop & CIESIN, 2018. URL https://dx.doi.org/10.5258/SOTON/WP00647. and DHS Sustain Bench (Yeh et al., 2021) includes tasks derived from survey data from the Demographic and Health Surveys (DHS) program (DHS, 2023). URL http://www.dhsprogram.com. and The United States Census Bureau (USCB) (US, 2023)... URL https://data.census.gov/table. and Zillow provides the Zillow Home Value Index (ZHVI) (ZR, 2023)... URL https://www.zillow.com/research/data/.
Dataset Splits	Yes	We use 2,000 samples for our test and validation sets across all tasks. We split the data into training, test, and validation partitions early in the process before sampling different sizes.
Hardware Specification	Yes	With a single V100 GPU, finetuning never takes more than 2 hours. and With QLo RA, finetuning never takes more than 2 hours for 10,000 samples with a single A100 GPU.
Software Dependencies	No	The paper mentions software tools like Open AI's fine-tuning API, Nominatim, Overpass API, XGBoost, fastText, and BERT, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We use a learning rate of 1e-5 with Adam W optimizer, training for 8 epochs with a batch size of 16, warmup ratio of 0.1, weight decay of 0.1, and a cosine learning rate scheduler. and For Lo RA, we use rank of 64, alpha of 16, and a dropout probability of 0.1. We also use 4-bit quantization with 4-bit Normal Float. We train for 4 epochs with bfloat16 mixed precision training, a batch size of 8, gradient accumulation steps of 2, gradient checkpoints enabled, a maximum gradient norm of 0.3, an initial learning rate of 1.5e-3 for the Adam W optimizer, a weight decay of 0.001, a cosine learning rate scheduler, and a warmup ratio of 0.03.