Fast Lossless Neural Compression with Integer-Only Discrete Flows

Authors: Siyu Wang, Jianfei Chen, Chongxuan Li, Jun Zhu, Bo Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate the efficiency and capacity of IODF, we conduct two sets of experiments regarding to the architecture design, filter pruning, and integer-only inference. The models are trained with Py Torch (Paszke et al., 2019) implementation and latency results are measured by deploying on a Tesla T4 GPU with the Tensor RT library. Density estimation performance is reported in bits per dimension (bpd). We compare IODF with IDF on Image Net32 and Image Net64 (Deng et al., 2009) dataset.
Researcher Affiliation Collaboration 1Dept. of Comp. Sci. & Tech., BNRist Center, Tsinghua-Bosch Joint ML Center, Tsinghua University; Peng Cheng Laboratory 2Gaoling School of AI, Renmin University of China; Beijing Key Lab of Big Data Management & Analysis Methods, Beijing, China. Correspondence to: Jianfei Chen <jianfeic@tsinghua.edu.cn>, Jun Zhu <dcszj@tsinghua.edu.cn>.
Pseudocode Yes Algorithm 1 Training IODF
Open Source Code Yes Open-source code is available at https://github.com/thu-ml/IODF.
Open Datasets Yes We use the down-sampled Image Net datasets from https://image-net.org/data/ downsample/Imagenet32_train.zip, following Grci c et al. (2021); Hazami et al. (2022).
Dataset Splits No Table 2. Overall evaluation results on test datasets (measured in bits per dimension) of IDF-Dense Nets, IDF-Res Nets, and pruned models of different FLOPs pruning ratio on Image Net32 and Image Net64. The paper mentions training and testing datasets, but does not explicitly provide details about a validation dataset split or how it was used in training beyond general epochs.
Hardware Specification Yes We train IODF using 8 Nvidia RTX 2080Ti GPUs. We build inference engine and evaluate the latency on a Tesla T4 GPU and Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz with Tensor RT8.2.0.6, CUDA10.2.
Software Dependencies Yes The codes for our experiments are implemented with Py Torch (Paszke et al., 2019). ... We build inference engine and evaluate the latency on a Tesla T4 GPU and Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz with Tensor RT8.2.0.6, CUDA10.2.
Experiment Setup Yes The models are trained 100 epochs for Image Net32 and 50 epochs for Image Net64. See Appendix B.1 for architecture and training details. ... Table 5. IDF-Res Nets architecture and optimization parameters for each experiment. ... In training model with gated convolutions, we initialize gates with α = 0.8 and set lr = 0.00005, lr decay = 0.99. ... For quantized model, we initialize scale parameters in quantizers data dependently... We set lr = 1e 4, lr decay = 0.99 in simulated quantization training and quantized models are fine-tuned for 10 epochs.