COSMIC: Compress Satellite Image Efficiently via Diffusion Compensation

Authors: Ziyuan Zhang, Han Qiu, Maosen Zhang, Jun Liu, Bin Chen, Tianwei Zhang, Hewu Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.
Researcher Affiliation Academia Ziyuan Zhang1 Han Qiu1* Maosen Zhang1 Jun Liu1* Bin Chen2 Tianwei Zhang3 Hewu Li1 1 Tsinghua University, China 2 Harbin Institute of Technology, Shenzhen, China 3 Nanyang Technological University, Singapore
Pseudocode Yes We present a more detailed explanation of our two-stage training and inference pipeline in algorithm 1 and algorithm 2.
Open Source Code Yes The code is publicly available at https://github.com/Joanna-0421/COSMIC.
Open Datasets Yes We use the function Map of the World (f Mo W) [13], which has 62 categories, and in which each image is paired with different types of metadata features, as our training data and test data.
Dataset Splits No The paper mentions "training data and test data" but does not explicitly specify a validation split percentage, sample counts, or how a validation set was created or used for hyperparameter tuning. It mentions "randomly crop the image to a resolution of 256 256 pixels" for training data but no split details.
Hardware Specification Yes All the training experiments are performed on 10 NVIDIA Ge Force RTX 3090
Software Dependencies No The paper mentions using "Adam optimizer" and "DDIM sampling" but does not provide specific version numbers for these or other key software components or libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes First, we train the image compression encoder E, image encoder E and image decoder D together using LIC for 100 epochs with a batchsize of 32. Second, we freeze the parameters of the model trained in the first stage, use the pretrained stable diffusion model for the noise prediction network, and finetune it using Lldm for 10 epochs with a batchsize of 4. All the training experiments are performed on 10 NVIDIA Ge Force RTX 3090 using Adam optimizer with lr = 1 10 4 and λ {0.00067, 0.0013, 0.0026, 0.005}.