Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

Authors: Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, Luc V. Gool

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.
Researcher Affiliation Collaboration Eirikur Agustsson ETH Zurich aeirikur@vision.ee.ethz.ch Fabian Mentzer ETH Zurich mentzerf@vision.ee.ethz.ch Michael Tschannen ETH Zurich michaelt@nari.ee.ethz.ch Lukas Cavigelli ETH Zurich cavigelli@iis.ee.ethz.ch Radu Timofte ETH Zurich & Merantix timofter@vision.ee.ethz.ch Luca Benini ETH Zurich benini@iis.ee.ethz.ch Luc Van Gool KU Leuven & ETH Zurich vangool@vision.ee.ethz.ch
Pseudocode No The paper describes mathematical formulations and algorithmic steps in prose and equations (e.g., Section 3.2 'Our Method'). However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' block or figure.
Open Source Code No The paper does not provide any specific links to source code repositories or explicit statements confirming the release of source code for the described methodology. It does not mention supplementary material containing code for the methodology.
Open Datasets Yes Our training set is composed similarly to that described in [4]. We used a subset of 90,000 images from Image NET [9]. To evaluate the image compression performance of our Soft-to-Hard Vector Quantization Autoencoder (SHVQ) method we use four datasets, namely Kodak [2], B100 [31], Urban100 [14], Image NET100 (100 randomly selected images from Image NET [25]). For DNN compression, we investigate the Res Net [13] architecture for image classification. We adopt the same setting as [6] and consider a 32-layer architecture trained for CIFAR-10 [18].
Dataset Splits Yes Our training set is composed similarly to that described in [4]. We used a subset of 90,000 images from Image NET [9], which we downsampled by a factor 0.7 and trained on crops of 128 x 128 pixels, with a batch size of 15. To estimate the probability distribution p for optimizing (8), we maintain a histogram over 5,000 images, which we update every 10 iterations with the images from the current batch.
Hardware Specification Yes Our full Image Compression Autoencoder has 6.37M trainable parameters, trained using Adam [17] for 300,000 iterations on an NVIDIA Titan X GPU using tensorflow.
Software Dependencies No The paper mentions using 'tensorflow' in Appendix A.2, but it does not specify a version number for tensorflow or any other software libraries or dependencies. It only mentions 'Adam [17]' as an optimizer, which is a method, not a specific software package with a version.
Experiment Setup Yes We trained different models using Adam [17], see Appendix A.2. We used a subset of 90,000 images from Image NET [9], which we downsampled by a factor 0.7 and trained on crops of 128 x 128 pixels, with a batch size of 15. Our full Image Compression Autoencoder... trained using Adam [17] for 300,000 iterations... We used a learning rate schedule of 1e-4 for 250k iterations, then 1e-5 for 50k iterations. We implemented the entropy minimization by using L = 75 centers and chose β = 0.1... The training was performed with the same learning parameters as the original model was trained with (SGD with momentum 0.9). The annealing schedule used was a simple exponential one, σ(t + 1) = 1.001 σ(t) with σ(0) = 0.4.