CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
Authors: Anthony Fuller, Koreen Millard, James Green
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | CROMA outperforms the current So TA multispectral model, evaluated on: four classification benchmarks finetuning (avg. 1.8%), linear (avg. 2.4%) and nonlinear (avg. 1.4%) probing, k NN classification (avg. 3.5%), and K-means clustering (avg. 8.4%); and three segmentation benchmarks (avg. 6.4%). |
| Researcher Affiliation | Academia | 1Department of Systems and Computer Engineering 2Department of Geography and Environmental Studies Carleton University, Ottawa, Canada |
| Pseudocode | No | The paper describes the model architecture and objectives in text and diagrams (Figure 1) but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Code and pretrained models: https://github.com/antofuller/CROMA |
| Open Datasets | Yes | We pretrain CROMA models on the SSL4EO dataset [70] a large geographically and seasonally diverse unlabeled dataset. ... The multi-label Big Earth Net dataset [76]... The f Mo W-Sentinel dataset [26]... The Euro SAT dataset [77]... The Canadian Cropland dataset [78]... The DFC2020 dataset [87]... The Dynamic World dataset [88]... The MARIDA dataset [89] |
| Dataset Splits | Yes | The multi-label Big Earth Net dataset [76] (35,420 train samples and 118,065 validation samples); this is 10% of the complete Big Earth Net training set that is now used by default [25, 26] to reduce the costs of finetuning and is better suited for a remote sensing benchmark [22]. |
| Hardware Specification | Yes | We perform all pretraining experiments on an NVIDIA DGX server (8 A100 80 GB), including ablations. |
| Software Dependencies | No | The paper mentions using bfloat16 precision and the Adam W optimizer but does not specify versions for software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We use an NVIDIA DGX server (8 A100-80GB), the maximum batch size that can fit into 640 GB of VRAM (7,200 for our default Vi T-B), bfloat16 precision, a base learning rate of 4e-6, warmup for 5% of the total epochs, and cooldown via a cosine decay schedule. We use the same normalization procedure as Sat MAE [26]. For data augmentation, we randomly crop 60-180 pixel squares from the original 264 264 pixels and resize the crops to 120 120 pixels (our default image size). We also perform vertical and horizontal flipping, 90-degree rotations, and mixup=0.3. |