Multi-Modality Deep Network for Extreme Learned Image Compression

Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2 to 4 bitrates of ours.
Researcher Affiliation Academia 1School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Shanghai Collaborative Innovation Center of Intelligent Visual Computing, Fudan University, Shanghai, China 2School of Communication, Shanghai University, Shanghai, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes Following the previous multimodal machine learning-based works (Zhang et al. 2017), the widely used datasets including CUB (Wah et al. 2011) and Oxford-102 (Nilsback and Zisserman 2008) are employed for evaluation.
Dataset Splits No CUB consists of 200 species of bird, with a total of 11,788 images including 8,855 images for training and 2,933 images for testing. Oxford-102 has 102 flower categories, of which 7,034 images are utilized for training and 1,155 images are utilized for testing.
Hardware Specification Yes All the experiments are conducted on a NVIDIA Ge Force RTX 1080 Ti.
Software Dependencies No The paper mentions "We adopt Pytorch as the training toolbox" but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Our model is optimized for 300 epochs with a learning rate of 1 10 4. The hyper-parameters k1, k2, k3, k4 and β of the global loss function are empirically set as 0.075 2 5, 0.15, 5, 0.005 and 40, respectively. The λb is set as 2 4, and the λa is set as (23, 22, 21) to adapt to different bitrates.