Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Authors: Utkarsh Mall, Cheng Perng Phoo, Meilin Kelsey Liu, Carl Vondrick, Bharath Hariharan, Kavita Bala
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS AND RESULTS |
| Researcher Affiliation | Academia | 1Cornell University, Ithaca, NY 2Columbia University, New York, NY |
| Pseudocode | No | The paper describes methods in text and equations but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code, data, and other resources are available at: https://graft.cs.cornell.edu |
| Open Datasets | Yes | To perform our training, we need a dataset of ground-satellite image pairs. We collected two such datasets for two different kinds of remote sensing imagery: NAIP (U.S.G.S., 2022) (high resolution, with 1 meter per pixel) and Sentinel-2 (Drusch et al., 2012) (low resolution, with 10 meters per pixel). Our dataset collection efforts yield 10.2 million pairs for NAIP and 8.7 million pairs for Sentinel-2 (also refer to Appendix A). |
| Dataset Splits | Yes | We select hyperparameters using a validation set with NAIP resolution that we collected. This validation set contains 14 categories with a total of 2632 single-label images. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like Vi T-B/16 and Adam W, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train all models for 10 epochs using Adam W with weight decay set to 1e-2. For image-level model, we linearly ramp up the learning rate from 0 to 1e-5 and then decrease the learning rate using a cosine schedule. For pixel-level model, we linearly ramp up the learning rate from 0 to to 5e-5 and then decrease the learning rate to zero using a cosine schedule. All models are initialized using CLIP s weights and the temperature hyperparameter is set to τ = 0.07. |