Modality-Agnostic Variational Compression of Implicit Neural Representations

Authors: Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate strong results over a large set of diverse modalities using the same algorithm without any modality-specific inductive biases. We show results on images, climate data, 3D shapes and scenes as well as audio and video, introducing VC-INR as the first INR-based method to outperform codecs as well-known and diverse as JPEG 2000, MP3 and AVC/HEVC on their respective modalities.
Researcher Affiliation Collaboration 1Deep Mind 2University College London 3KAIST 4POSTECH.
Pseudocode Yes Algorithms 1 and 2 show details of the Meta-Learning (introduced in the previous section) and quantisation learning stages.
Open Source Code No The paper does not contain an explicit statement about releasing their code or provide a link to a code repository for the described methodology.
Open Datasets Yes We verify VC-INR on various data modalities, including image, voxels, scene, climate, audio, and video datasets. Overall, our experimental results demonstrate strong results, consistently outperforming previous INR-based compression methods and improving on popular compression schemes such as MP3 on audio and AVC/HEVC on video clips. In particular, VC-INR achieves a new state-of-the-art results on modality-agnostic compression with INRs, improving the Peak Signal to Noise Ratio (PSNR) on the same bits-per-pixel (bpp) bit rate by 3.3 d B for CIFAR-10 (Krizhevsky et al., 2009), by 2 d B on Kodak1 (both images), 3.5 d B for ERA5 (climate data) (Hersbach et al., 2019) and 9.5 d B for Librispeech (audio) (Panayotov et al., 2015) respectively.
Dataset Splits Yes By following (Dupont et al., 2022a), we divide the dataset into 27,000 training examples and 3,000 test examples, and pre-processed the pixel coordinates into [0, 1]2 and feature values ranging from 0 to 1. For Meta-Learning, we also train the model on randomly cropped 32 32 patches and for evaluation, we split the image into non-overlapping patches where each modulations are adapted on each patches.
Hardware Specification No The paper mentions "Num devices {8}" in the hyperparameters tables, which is too general and does not specify concrete hardware components like GPU models, CPU types, or memory.
Software Dependencies No The paper mentions software components such as SIREN, Adam, Se LU activation, and Layer Norm, but it does not provide specific version numbers for any of these libraries or frameworks.
Experiment Setup Yes Appendix G. Hyperparameters, including Table 3. Hyperparameters for compression experiments on CIFAR-10, Table 4. Hyperparameters for compression experiments on Div2k/Kodak, Table 5. Hyperparameters for compression experiments on ERA5 (16 ), Table 6. Hyperparameters for compression experiments on Libri Speech, and Table 7. Hyperparameters for compression experiments on UCF-101. These tables detail various parameters such as 'Batch size per device', 'Outer learning rate', 'Num inner steps', 'Network depth', 'Network width', 'dim(ϕ)', 'λ (Ldistortion penalty)', etc.