GeoLLM: Extracting Geospatial Knowledge from Large Language Models
Authors: Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David B. Lobell, Stefano Ermon
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Across these tasks, our method demonstrates a 70% improvement in performance (measured using Pearson s r2) relative to baselines that use nearest neighbors or use information directly from the prompt, and performance equal to or exceeding satellite-based benchmarks in the literature. |
| Researcher Affiliation | Academia | Corresponding author, rohinm@cs.stanford.edu Stanford University University of Georgia |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code is available on the project website: https://rohinmanvi.github.io/Geo LLM |
| Open Datasets | Yes | World Pop (Tatem, 2017)... World Pop & CIESIN, 2018. URL https://dx.doi.org/10.5258/SOTON/WP00647. and DHS Sustain Bench (Yeh et al., 2021) includes tasks derived from survey data from the Demographic and Health Surveys (DHS) program (DHS, 2023). URL http://www.dhsprogram.com. and The United States Census Bureau (USCB) (US, 2023)... URL https://data.census.gov/table. and Zillow provides the Zillow Home Value Index (ZHVI) (ZR, 2023)... URL https://www.zillow.com/research/data/. |
| Dataset Splits | Yes | We use 2,000 samples for our test and validation sets across all tasks. We split the data into training, test, and validation partitions early in the process before sampling different sizes. |
| Hardware Specification | Yes | With a single V100 GPU, finetuning never takes more than 2 hours. and With QLo RA, finetuning never takes more than 2 hours for 10,000 samples with a single A100 GPU. |
| Software Dependencies | No | The paper mentions software tools like Open AI's fine-tuning API, Nominatim, Overpass API, XGBoost, fastText, and BERT, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use a learning rate of 1e-5 with Adam W optimizer, training for 8 epochs with a batch size of 16, warmup ratio of 0.1, weight decay of 0.1, and a cosine learning rate scheduler. and For Lo RA, we use rank of 64, alpha of 16, and a dropout probability of 0.1. We also use 4-bit quantization with 4-bit Normal Float. We train for 4 epochs with bfloat16 mixed precision training, a batch size of 8, gradient accumulation steps of 2, gradient checkpoints enabled, a maximum gradient norm of 0.3, an initial learning rate of 1.5e-3 for the Adam W optimizer, a weight decay of 0.001, a cosine learning rate scheduler, and a warmup ratio of 0.03. |