Multi-Modality Deep Network for Extreme Learned Image Compression
Authors: Xuhao Jiang, Weimin Tan, Tian Tan, Bo Yan, Liquan Shen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2 to 4 bitrates of ours. |
| Researcher Affiliation | Academia | 1School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing, Shanghai Collaborative Innovation Center of Intelligent Visual Computing, Fudan University, Shanghai, China 2School of Communication, Shanghai University, Shanghai, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. |
| Open Datasets | Yes | Following the previous multimodal machine learning-based works (Zhang et al. 2017), the widely used datasets including CUB (Wah et al. 2011) and Oxford-102 (Nilsback and Zisserman 2008) are employed for evaluation. |
| Dataset Splits | No | CUB consists of 200 species of bird, with a total of 11,788 images including 8,855 images for training and 2,933 images for testing. Oxford-102 has 102 flower categories, of which 7,034 images are utilized for training and 1,155 images are utilized for testing. |
| Hardware Specification | Yes | All the experiments are conducted on a NVIDIA Ge Force RTX 1080 Ti. |
| Software Dependencies | No | The paper mentions "We adopt Pytorch as the training toolbox" but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Our model is optimized for 300 epochs with a learning rate of 1 10 4. The hyper-parameters k1, k2, k3, k4 and β of the global loss function are empirically set as 0.075 2 5, 0.15, 5, 0.005 and 40, respectively. The λb is set as 2 4, and the λa is set as (23, 22, 21) to adapt to different bitrates. |