Channel-Level Variable Quantization Network for Deep Image Compression
Authors: Zhisheng Zhong, Hiroaki Akutsu, Kiyoharu Aizawa
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Quantitative and qualitative experiments verify the effectiveness of the proposed model and demonstrate that our method achieves superior performance and can produce much better visual reconstructions1. 4 Experiments |
| Researcher Affiliation | Collaboration | Zhisheng Zhong1 , Hiroaki Akutsu2 and Kiyoharu Aizawa1 1The University of Tokyo, Japan 2Hitachi Ltd, Japan |
| Pseudocode | No | No pseudocode or explicit algorithm block is present in the paper. |
| Open Source Code | Yes | 1Code address: https://github.com/zzs1994/CVQN |
| Open Datasets | Yes | We merge three common datasets, namely DIK2K [Timofte et al., 2017], Flickr2K [Lim et al., 2017], and CLIC2018, to form our training dataset, which contains approximately 4,000 images in total. Following many deep image compression methods, we evaluate our models on the Kodak dataset with the metrics MS-SSIM for lossy image compression. |
| Dataset Splits | No | We construct a validation dataset by randomly selecting N images from the training dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for experiments are provided. |
| Software Dependencies | No | No specific software versions (e.g., Python, PyTorch, TensorFlow, CUDA) or library versions are mentioned. |
| Experiment Setup | Yes | Parameter setting. In our experiments, we use the Adam optimizer [Kingma and Ba, 2015] with a mini-batch size M of 32 to train our five models on 256 256 image patches. We vary the quantized feature map ˆZ s channel number C from 16 to 80 to obtain different BPPs. The total number of training epochs equals to 400. The initialized learning rates are set to 1 10 4, 1 10 4, 5 10 5 and 1 10 4 for the encoder, quantizer, entropy model, and decoder, respectively. We reduce them twice (at Epoch-200 and Epoch-300) by a factor of five during training. In the channel attention residual en/decoder, we set the number of residual channel attention blocks B = 6 for all stages. The channel numbers for each stage in the encoder are 32, 64, 128, and 192, respectively, whereas those for each stage in the decoder are 192, 128, 64, and 32, respectively. In the variable quantization controller, we set the number of groups G = 3. The ratio vector r = [25%, 50%, 25%] . For loss function Eqn. (8), we choose negative MS-SSIM for the distortion loss Ldis and α = 128; we select cross entropy for the entropy estimation loss Lent and β = 0.001. |