Modality-Agnostic Variational Compression of Implicit Neural Representations
Authors: Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate strong results over a large set of diverse modalities using the same algorithm without any modality-specific inductive biases. We show results on images, climate data, 3D shapes and scenes as well as audio and video, introducing VC-INR as the first INR-based method to outperform codecs as well-known and diverse as JPEG 2000, MP3 and AVC/HEVC on their respective modalities. |
| Researcher Affiliation | Collaboration | 1Deep Mind 2University College London 3KAIST 4POSTECH. |
| Pseudocode | Yes | Algorithms 1 and 2 show details of the Meta-Learning (introduced in the previous section) and quantisation learning stages. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing their code or provide a link to a code repository for the described methodology. |
| Open Datasets | Yes | We verify VC-INR on various data modalities, including image, voxels, scene, climate, audio, and video datasets. Overall, our experimental results demonstrate strong results, consistently outperforming previous INR-based compression methods and improving on popular compression schemes such as MP3 on audio and AVC/HEVC on video clips. In particular, VC-INR achieves a new state-of-the-art results on modality-agnostic compression with INRs, improving the Peak Signal to Noise Ratio (PSNR) on the same bits-per-pixel (bpp) bit rate by 3.3 d B for CIFAR-10 (Krizhevsky et al., 2009), by 2 d B on Kodak1 (both images), 3.5 d B for ERA5 (climate data) (Hersbach et al., 2019) and 9.5 d B for Librispeech (audio) (Panayotov et al., 2015) respectively. |
| Dataset Splits | Yes | By following (Dupont et al., 2022a), we divide the dataset into 27,000 training examples and 3,000 test examples, and pre-processed the pixel coordinates into [0, 1]2 and feature values ranging from 0 to 1. For Meta-Learning, we also train the model on randomly cropped 32 32 patches and for evaluation, we split the image into non-overlapping patches where each modulations are adapted on each patches. |
| Hardware Specification | No | The paper mentions "Num devices {8}" in the hyperparameters tables, which is too general and does not specify concrete hardware components like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions software components such as SIREN, Adam, Se LU activation, and Layer Norm, but it does not provide specific version numbers for any of these libraries or frameworks. |
| Experiment Setup | Yes | Appendix G. Hyperparameters, including Table 3. Hyperparameters for compression experiments on CIFAR-10, Table 4. Hyperparameters for compression experiments on Div2k/Kodak, Table 5. Hyperparameters for compression experiments on ERA5 (16 ), Table 6. Hyperparameters for compression experiments on Libri Speech, and Table 7. Hyperparameters for compression experiments on UCF-101. These tables detail various parameters such as 'Batch size per device', 'Outer learning rate', 'Num inner steps', 'Network depth', 'Network width', 'dim(ϕ)', 'λ (Ldistortion penalty)', etc. |