GOMAA-Geo: GOal Modality Agnostic Active Geo-localization

Authors: Anindya Sarkar, Srikumar Sastry, Aleksis Pirinen, Chongjie Zhang, Nathan Jacobs, Yevgeniy Vorobeychik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluations, we show that GOMAA-Geo outperforms alternative learnable approaches and that it generalizes across datasets e.g., to disaster-hit areas without seeing a single disaster scenario during training and goal modalities e.g., to ground-level imagery or textual descriptions, despite only being trained with goals specified as aerial views. Our code is available at: https://github.com/mvrl/GOMAA-Geo.
Researcher Affiliation Collaboration Anindya Sarkar1 , Srikumar Sastry1 , Aleksis Pirinen2,3, Chongjie Zhang1, Nathan Jacobs1, Yevgeniy Vorobeychik1 1Department of Computer Science and Engineering, Washington University in St. Louis 2RISE Research Institutes of Sweden 3Swedish Centre for Impacts of Climate Extremes (climes) Corresponding authors: {anindya, s.sastry}@wustl.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in prose and equations.
Open Source Code Yes Our code is available at: https://github.com/mvrl/GOMAA-Geo.
Open Datasets Yes We primarily utilize the Massachusetts Buildings (Masa) dataset [20] for both the development and evaluation of GOMAA-Geo in settings where the goal content is provided as aerial imagery. ... To alleviate this, we have collected a dataset from different regions across the world, which allows for specifying the goal content as aerial imagery, ground-level imagery, or natural language text. ... We refer to this dataset as Multi-Modal Goal Dataset for Active Geolocalization (MM-GAG). ... Our dataset is publicly available at this link. [Appendix L]
Dataset Splits Yes We primarily utilize the Massachusetts Buildings (Masa) dataset [20] for both the development and evaluation of GOMAA-Geo in settings where the goal content is provided as aerial imagery. The dataset is split into 70% for training and 15% each for validation and testing.
Hardware Specification Yes Compute Resources We use a single NVidia A100 GPU server with a memory of 80 GB for training and a single NVidia V100 GPU server with a memory of 32 GB for running the inference.
Software Dependencies No The paper mentions using specific models like Falcon, CLIP, and PPO, but does not specify version numbers for general software dependencies (e.g., Python, PyTorch, CUDA) to ensure full reproducibility.
Experiment Setup Yes Details of CLIP-MMFE Module ... learning rate of 1e-4, a batch size of 256, the number of training epochs as 300, and the Adam optimizer... Details of GASP-based LLM Module ... learning rate of 1e-4, batch size of 1, number of training epochs as 300, and the Adam optimizer... Planning module ... learning rate of 1e-4, batch size of 1, number of training epochs as 300, and the Adam optimizer. We choose the values of α and β (as defined in equation 4) to be 0.5 and 0.01 respectively. We also choose the clipping ratio (ϵ) to be 0.2. We select discount factor γ to be 0.99 for all the experiments and copy the parameters of π onto πold after every 4 epochs of policy training.