LDMIC: Learning-based Distributed Multi-view Image Coding
Authors: Xinjie Zhang, Jiawei Shao, Jun Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that LDMIC significantly outperforms both traditional and learning-based MIC methods while enjoying fast encoding speed. Code is released at https://github.com/Xinjie-Q/LDMIC. |
| Researcher Affiliation | Academia | Xinjie Zhang, Jiawei Shao, Jun Zhang The Hong Kong University of Science and Technology, Hong Kong, China {xinjie.zhang, jiawei.shao}@connect.ust.hk, eejzhang@ust.hk |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is released at https://github.com/Xinjie-Q/LDMIC. |
| Open Datasets | Yes | To compare with the recently developed learning-based stereo image compression methods, two common stereo image datasets, i.e., Instereo2K (Bao et al., 2020) and Cityscapes (Cordts et al., 2016), are chosen to evaluate the coding efficiency of the proposed framework. Apart from testing stereo image datasets related to 3D scenes, we also select a pedestrian surveillance dataset, i.e., Wild Track (Chavdarova et al., 2018), acquired by seven random placed cameras with overlapping Fo V, which is to demonstrate the potentials of our proposed framework in distributed camera systems without epipolar geometry relationship between images. |
| Dataset Splits | Yes | The Cityscapes dataset is comprised of 5000 image pairs for far views and outdoor scenes, which is categorized into 2975 training, 500 validation and 1525 testing pairs. |
| Hardware Specification | Yes | The whole framework is implemented by Compress AI (B egaint et al., 2020) and trained on a machine with NVIDIA RTX 3090 GPU. ... Table 2 shows the computational complexity of seven image codecs running on an Intel Xeon Gold 6230R processor with base frequency 2.10GHz and a single CPU core |
| Software Dependencies | Yes | We use HM-16.25 and HTM-16.3 softwares to evaluate the coding efficiency of HEVC and MVHEVC on the Wild Track dataset, respectively. In addition, we run VTM-17.0 to test VVC-intra and VVC. |
| Experiment Setup | Yes | We train our models with five different λ values, where λ = 256, 512, 1024, 2048, 4096 (8, 16, 32, 64, 128) under MSE (MS-SSIM). For MSE optimized models, they are trained from scratch for 400 epochs on In Stereo2K/Cityscapes and 700 epochs on Wild Track by using Adam optimizer (Kingma & Ba, 2014), in which the batch size is taken as 8. The learning rate is initially set as 10^-4 and decreased by a factor of 2 every 100 epochs until it reaches 400 epochs. As for MS-SSIM optimized models, we fine-tune the MSE optimized networks for 300 (400) epochs with the initial learning as 5 x 10^-5 on stereo (multi-camera) image dataset. During training, each image is randomly flipped and cropped to the size of 256 x 256 for data augmentation. |