End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy
Authors: Huiming Zheng, Wei Gao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate our method outperforms existing image compression methods in two RGB-D image datasets. |
| Researcher Affiliation | Academia | School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen, China 2 Peng Cheng Laboratory, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'CAAI-Mind Spore Open Fund, developed on Open I Community', but does not explicitly state that the source code for their proposed method is released, nor does it provide a direct link to it. |
| Open Datasets | Yes | SUN-RGBD The SUN-RGBD dataset (Song, Lichtenberg, and Xiao 2015) is a widely used computer vision research dataset for indoor scene understanding and depth perception tasks. NYU-Depth V2 NYU-Depth V2 dataset (Chodosh, Wang, and Lucey 2019) comprises video sequences capturing diverse indoor scenes recorded by the RGB and depth cameras of Microsoft Kinect. |
| Dataset Splits | Yes | For training, 8,000 image pairs were randomly selected, while 1,000 image pairs were chosen for validation, and an additional 1,000 image pairs were reserved for testing. We divide the entire dataset into three parts, 1,159 image pairs for training, 145 image pairs for validating, and 145 image pairs for testing. |
| Hardware Specification | Yes | It costs about ten days for the training stage according to Tesla V100. |
| Software Dependencies | No | The paper mentions 'CUDA-enabled Py Torch implementation' but does not provide specific version numbers for PyTorch, CUDA, or other software dependencies. |
| Experiment Setup | Yes | We set different values for the hyperparameter λ to control the bit rate. Adam optimizer (Kingma and Ba 2014) is adopted in the training process. We initialize the learning rate to 1e 4. It gradually decreases with the update of the model during training and eventually falls to 1e 5. The batch size is set to 4. We train about 1000 epochs for each model. The input training data is trimmed to the size of 256 256 convenient for model inference. |