SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Authors: Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram Rajagopal
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With continual pre-training on this dataset, we obtain a VLM that surpasses baseline models with a 6.2% average accuracy gain in zero-shot scene classification across seven benchmark datasets. |
| Researcher Affiliation | Academia | Stanford University {zhecheng, rajanie, tianyuah, ramr}@stanford.edu, jiajunwu@cs.stanford.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The dataset and associated models are publicly available at https://github.com/wangzhecheng/Sky Script |
| Open Datasets | Yes | 1The dataset and associated models are publicly available at https://github.com/wangzhecheng/Sky Script |
| Dataset Splits | No | The paper sets aside 30,000 image-text pairs for testing cross-modal retrieval and mentions an auxiliary classification dataset, but it does not specify the training/validation/test splits for the main Sky Script dataset or how the auxiliary dataset is partitioned for validation and training. |
| Hardware Specification | Yes | The continual pre-training is conducted on 4 NVIDIA A100 GPUs with a batch size of 512 and total epochs of 20. |
| Software Dependencies | No | The paper does not provide specific software dependency versions (e.g., Python 3.8, PyTorch 1.9) for reproducibility. |
| Experiment Setup | Yes | The continual pre-training is conducted on 4 NVIDIA A100 GPUs with a batch size of 512 and total epochs of 20. |