Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scale-aware Recognition in Satellite Images under Resource Constraints
Authors: Shreelekha Revankar, Cheng Perng Phoo, Utkarsh Kumar Mall, Bharath Hariharan, Kavita Bala
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach with multiple recognition models (supervised and open-vocabulary) and multiple satellite modalities. We find that compared to simply using high resolution images always, our framework improves accuracy up to 13 points while reducing the number of HR images used by 5 . Our approach also significantly outperforms (by more than 25 points) other prior work that trades off between accuracy and cost. In sum, our results demonstrate that our holistic reasoning of scale leads to significantly higher accuracy with large reductions in cost. |
| Researcher Affiliation | Academia | Shreelekha Revankar1 , Cheng Perng Phoo1, Utkarsh Mall2, Bharath Hariharan1, Kavita Bala1 1Cornell University 2Columbia University Corresponding Email: EMAIL |
| Pseudocode | No | The paper describes the methodology in Section 3 using textual descriptions and mathematical equations (e.g., equations 1, 2, 3, 4, 5) and a system overview flowchart (Figure 2), but does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format. |
| Open Source Code | No | The abstract states, 'Resources are available on our website.' However, this is too vague to confirm the availability of source code for the methodology. Additionally, Section 4.1.2 states, 'All data and download scripts will be publicly released,' which indicates a future release and not immediate availability of the methodology's source code. |
| Open Datasets | Yes | We curate the following two benchmarks. Both benchmarks use Sentinel-2 as the low-resolution modality... In this benchmark, we use HR imagery (GSD=1m) captured by the National Agriculture Imagery Program (U.S. Geological Survey, 2024)... Our second benchmark leverages the NICFI Satellite Data Program Basemaps for Tropical Forest Monitoring (Planet Team, 2024)... All data was sourced from Google Earth Engine (Gorelick et al., 2017). Following Mall et al. (2024) we use Open Street Map contributors (2024) to obtain ground truth annotations for 40 concepts (listed in Appendix A.1). All data and download scripts will be publicly released. |
| Dataset Splits | Yes | We created a training dataset and validation dataset using images from the following states and regions: Arkansas, Delaware, Idaho, Maine, Rhode Island, Wyoming, and US Virgin Islands. These datasets are comprised of 45,885 Sentinel-2 images for LR, and 4,588,500 NAIP images for HR in the training dataset, and 4,938 and 493,800 images respectively for the validation dataset. Our testing imagery is comprised of images from D.C., Puerto Rico, and Hawaii. The test dataset is comprised of 5,015 Sentinel-2 and 505,100 NAIP images. |
| Hardware Specification | Yes | All inference is performed using a batch size of 32 on a single Nvidia RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions several models and platforms such as 'GRAFT', 'CLIP', 'ResNet-50', 'Google Earth Engine', and 'Open Street Map', but does not provide specific version numbers for any software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | All inference is performed using a batch size of 32 on a single Nvidia RTX A6000 GPU. For our experiments we set a budget of 1000 locations to acquire HR imagery, with each location covering roughly 5 sq. km. |