Neural Image Compression: Generalization, Robustness, and Spectral Biases

Authors: Kelsey Lieberman, James Diffenderfer, Charles Godfrey, Bhavya Kailkhura

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to the popular CLIC and Kodak benchmarks. Next, we propose spectrally-inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of several classic codecs and NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data.
Researcher Affiliation Collaboration Kelsey Lieberman Department of Computer Science Duke University Durham, NC USA kelsey.lieberman@duke.edu James Diffenderfer Lawrence Livermore National Laboratory Livermore, CA USA diffenderfer2@llnl.gov Charles Godfrey Thomson Reuters Labs Eagan, MN USA charles.godfrey@thomsonreuters.com Bhavya Kailkhura Lawrence Livermore National Laboratory Livermore, CA USA kailkhura1@llnl.gov
Pseudocode No The paper describes methods and experiments in detail but does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes Code and data will be made available at https://github.com/klieberman/ood_nic.
Open Datasets Yes To evaluate NIC in the presence of environmental or digital distribution shifts, we generated variants of the CLIC and Kodak datasets, which we refer to as CLIC-C and Kodak-C. Following the techniques presented in [26] for studying the performance of DNN classifiers encountering distributional shifts in the wild", our -C datasets consist of images augmented by 15 common corruptions. For each image in the original dataset, the -C dataset contains a corrupted version of the image for each of the 15 common corruptions2, and for each of five corruption severity levels, with 1 being the lowest severity and 5 being the highest. A sample of some corruptions on CLIC-C is provided in Figure 1a. We utilize two benchmark datasets for evaluating performance of our NIC models. For training and testing, we make use of CLIC (Challenge on Learned Image Compression) 2020 [59]. This collection is comprised of 1633 training, 102 validation, and 428 test images. The images are of varying resolution and are further categorized as either professional or mobile. To evaluate generalization, we make use of the Kodak [34] dataset which consists of 24 images of resolution 768 512 (or 512 768).
Dataset Splits Yes We utilize two benchmark datasets for evaluating performance of our NIC models. For training and testing, we make use of CLIC (Challenge on Learned Image Compression) 2020 [59]. This collection is comprised of 1633 training, 102 validation, and 428 test images.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or cloud instance names) used for running its experiments.
Software Dependencies No The paper mentions using "pytorch model architecture in the compressai repository [10]" and that they "used github.com/bethgelab/imagecorruptions to apply corruptions to Kodak and CLIC images." However, it does not specify exact version numbers for PyTorch or other software dependencies.
Experiment Setup Yes We train NIC models using the train split of the CLIC 2020 dataset with batches of size 8 and random crops of size 256 256. For SH NIC, we set N = M = 192 and train for 5,000 epochs (about 1M iterations). We trained 8 models with λs 0.0012, 0.005, 0.01, 0.03, 0.05, 0.1, 0.15, and 0.26. We used the pytorch model architecture in the compressai repository [10]. For ELIC, we use N = 192, M = 320 and train for 3,900 epochs. We trained 11 models with unique λs (0.001, 0.0025, 0.003, 0.004, 0.006, 0.008, 0.016, 0.025, 0.032, 0.05, 0.15). We utilized this repository for the model architecture, but trained our own models on CLIC rather than experimenting with the publicly available checkpoints trained on Image Net. Note that we trained more models for ELIC than FR NIC in order to obtain bpps and PSNRs which fit our fixed-bpp/PSNR constraint.