Universally Quantized Neural Compression

Authors: Eirikur Agustsson, Lucas Theis

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments with two models: (a) a simple linear model and (b) a more complex model based on the hyperprior architecture proposed by Ballé et al. [6] and extended by Minnen et al. [23]. We evaluate all models on the Kodak [20] dataset by computing the rate-distortion (RD) curve in terms of bits-per-pixel (bpp) versus peak signal-to-noise ratio (PSNR).
Researcher Affiliation Industry Eirikur Agustsson Google Research eirikur@google.com Lucas Theis Google Research theis@google.com
Pseudocode No No pseudocode or algorithm blocks are present.
Open Source Code No The paper does not provide any specific links or explicit statements about the release of source code for the described methodology.
Open Datasets Yes We evaluate all models on the Kodak [20] dataset by computing the rate-distortion (RD) curve in terms of bits-per-pixel (bpp) versus peak signal-to-noise ratio (PSNR). [20] Kodak. Photo CD PCD0992, 1993. URL http://r0k.us/graphics/kodak/.
Dataset Splits No The paper does not specify explicit training/validation/test dataset splits. It mentions using "256x256 pixel crops extracted from a set of 1M high resolution JPEG images" for training and evaluating on the "Kodak [20] dataset", but no specific split percentages or counts are provided for these datasets.
Hardware Specification Yes The training time was about 30 hours for the linear models and about 60 hours for the hyperprior models on an Nvidia V100 GPU.
Software Dependencies No The paper mentions using the "Adam optimizer [19]" but does not specify any software versions for libraries, frameworks, or programming languages (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes We optimized all models for mean squared error (MSE). The Adam optimizer [19] was applied for 2M steps with a batch size of 8 and a learning rate of 10^-4 which is reduced to 10^-5 after 1.6M steps. For the first 5,000 steps only the density models were trained and the learning rates of the encoder and decoder transforms were kept at zero. For the hyperprior models we set λ = 2i for i ∈ {−6, ..., 1} and decayed it by a factor of 1/10 after 200k steps. For the linear models we use slightly smaller λ = 0.4 * 2i and reduced it by a factor of 1/2 after 100k steps and again after 200k steps. For soft rounding we linearly annealed the parameter from 1 to 16 over the full 2M steps.