IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Authors: Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments we found that neither stochastic rounding nor replacing the identity function in the straight-through estimator with a soft approximation of the rounding function improved the results. We compare continuous flow models that are trained using the unbiased gradient θL with discrete flow models that are trained using the straight-through gradient estimator gst. Table 1: Compression results in bits per dimension (bpd) for IDF++, hand-designed codecs and other deep density estimators based on normalizing flows, super resolution and variational auto-encoders. |
| Researcher Affiliation | Industry | Google Research {riannevdberg,agritsenko,dehghani,casperkaae,salimans}@google.com |
| Pseudocode | No | The paper describes algorithmic steps in text but does not contain formally labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing source code for the described methodology or provide a repository link. |
| Open Datasets | Yes | The training set of CIFAR-10 consists of 50000 images and the test set contains 10000 images. Image Net-32 and Image Net-64 contain approximately 1250000 train images and 50000 test images. Table 1: Compression results in bits per dimension (bpd) for IDF++, hand-designed codecs and other deep density estimators based on normalizing flows, super resolution and variational auto-encoders. Where available, the bpd according to the model s negative log-likelihood is indicated in parenthesis. Results with a are taken from Townsend et al. (2019a), and those with are taken from Hoogeboom et al. (2019a). |
| Dataset Splits | Yes | Figure 3 shows the performance of models with the proposed modifications (Dense Net++) on the validation set (consisting of 20% of the training set) on CIFAR-10 as a function of flows per level, after 300K iterations. To make a fair comparison against other methods like local bits-back coding (LBB) by Ho et al. (2019b) we train our final models on the entire training set without holding out part of the training set as a validation set. The training set of CIFAR-10 consists of 50000 images and the test set contains 10000 images. |
| Hardware Specification | Yes | All experiments were run with 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Tensor Flow (Abadi et al., 2015)' but does not specify a version number or provide version numbers for other software dependencies. |
| Experiment Setup | Yes | The model is trained with the Adamax optimizer (Kingma & Ba, 2014) with an exponential learning rate schedule with base learning rate equal to 1 × 10−3 and a linear warmup phase of 10 epochs. See Table 2 for more details on the learning rate decay, the number of levels, the batch size and the number of epochs used for training. |