Learning De-Biased Representations for Remote-Sensing Imagery

Authors: Zichen Tian, Zhaozheng CHEN, QIANRU SUN

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments in two transfer learning scenarios in the RS domain: from natural to optical RS images, and from optical RS to multi-spectrum RS images. We perform object classification and oriented object detection tasks on the optical RS dataset DOTA and the SAR dataset FUSRS. Results show that our deb Lo RA consistently surpasses prior arts across these RS adaptation settings, yielding up to 3.3 and 4.7 percentage points gains on the tail classes for natural optical RS and optical RS multi-spectrum RS adaptations, respectively, while preserving the performance on head classes, substantiating its efficacy and adaptability.
Researcher Affiliation Academia Zichen Tian Zhaozheng Chen Qianru Sun School of Computing and Information Systems Singapore Management University {zichen.tian.2023,zzchen.2019}@phdcs.smu.edu.sg,qianrusun@smu.edu.sg
Pseudocode Yes The complete algorithm of deb Lo RA is summarized in Algorithm 1. Algorithm 1 deb Lo RA Require: Long-tailed training set D = {(x, y)}, pre-trained encoder fθ : X Z, number of clusters K, balance factor ρ Ensure: A Lo RA module gϕ that de-biases fθ
Open Source Code Yes Code: https://github.com/doem97/deblora
Open Datasets Yes We use the DOTA dataset [10], a large-scale benchmark for RS object recognition. DOTA contains 188,282 instances from 15 categories, covering various scales, orientations, and shapes. We selected the FUSAR-Ship [21] and SRSDD [31] datasets as our source datasets due to their high resolution (<10m) and fine-grained ship subcategories.
Dataset Splits No The paper mentions using the DOTA dataset, which has existing splits, and discusses combining FUSAR-Ship and SRSDD datasets, with test sample counts provided in the appendix. However, it does not explicitly provide the specific training/validation/test dataset split percentages or total sample counts used for reproduction across all experiments.
Hardware Specification No The paper mentions 'GPU hours' for their method and specific GPU models (e.g., A100) when discussing other works, but it does not specify the exact GPU models or other hardware specifications used for their own experiments in the main text or appendix.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes Implementation Details. 1) Fine-tuning baseline. We fine-tune the foundation models until the training loss stabilizes. During inference, we use null prompts as no ground truth is available. For SD, we extract features from the U-Net after applying one denoising step [50]. For Open CLIP, we extract features from its visual encoder s final layer before the projection head. 2) Lo RA and variants. We apply Lo RA modules to all linear layers in the foundation models. We use a rank of 8 for Lo RA, as it suffers from the most severe long-tail issues. We also validate our method with higher ranks (e.g., 64) in Table 2. During inference, we extract features from the U-Net encoder output followed by global average pooling (GAP). For c Lo RA, we concatenate the category-specific features after GAP. 3) deb Lo RA. The deb Lo RA involves two hyperparameters: the calibration factor α, and the number of clusters K. We set α as inversely proportional to the imbalance ratio of the tail class, as described in Section 4.4. We empirically set K=32 (ablation study on K are provided in Appendix).