Transformer-based Transform Coding
Authors: Yinhao Zhu, Yang Yang, Taco Cohen
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate image compression models on 4 datasets: Kodak (Kodak, 1999), CLIC2021 testset (CLIC, 2021), Tecnick testset (Asuni & Giachetti, 2014), and JPEG-AI testset (JPEG-AI, 2020). ... As can be seen from Figure 3, Swin T transform consistently outperforms its convolutional counterpart; the RD-performance of Swin T-Hyperprior is on-par with Conv-Ch ARM, despite the simpler prior; Swin T-Ch ARM outperforms VTM-12.1 across a wide PSNR range. |
| Researcher Affiliation | Industry | Yinhao Zhu Yang Yang Taco Cohen Qualcomm AI Research {yinhaoz, yyangy, tacos}@qti.qualcomm.com |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 2, Figure 10-13) but no formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Training All image compression models are trained on the CLIC2020 training set. ... For P-frame compression models, we follow the training setup of SSF. Both Conv-SSF and Swin TSSF are trained on Vimeo-90k Dataset (Xue et al., 2019)... We evaluate image compression models on 4 datasets: Kodak (Kodak, 1999), CLIC2021 testset (CLIC, 2021), Tecnick testset (Asuni & Giachetti, 2014), and JPEG-AI testset (JPEG-AI, 2020). |
| Dataset Splits | No | The paper specifies training sets (CLIC2020, Vimeo-90k) and evaluation test sets (Kodak, CLIC2021, Tecnick, JPEG-AI, UVG, MCL-JCV) but does not explicitly provide details for a separate validation split with specific percentages or sample counts. |
| Hardware Specification | Yes | The models run with PyTorch 1.9.0 on a workstation with one RTX 2080 Ti GPU. ... evaluated on an Intel Core i9-9940 CPU @ 3.30GHz, averaged over 24 Kodak images. ... same host machine with Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz. |
| Software Dependencies | Yes | The models run with PyTorch 1.9.0 on a workstation with one RTX 2080 Ti GPU, with PyTorch 1.9.0 and Cuda toolkit 11.1. |
| Experiment Setup | Yes | Training All image compression models are trained on the CLIC2020 training set. Conv Hyperprior and Swin T-Hyperprior are trained with 2M batches. Each batch contains 8 random 256x256 crops from training images. Learning rate starts at 10-4 and is reduced to 10-5 at 1.8M step. ... To cover a wide range of rate and distortion, for each solution, we train 5 models with β {0.003, 0.001, 0.0003, 0.0001, 0.00003}. ... For P-frame compression models, ... trained on Vimeo-90k Dataset (Xue et al., 2019) for 1M steps with learning rate 10-4, batch size of 8, crop size of 256x256, followed by 50K steps of training with learning rate 10-5 and crop size 384x256. |