Transformer-based Transform Coding

Authors: Yinhao Zhu, Yang Yang, Taco Cohen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate image compression models on 4 datasets: Kodak (Kodak, 1999), CLIC2021 testset (CLIC, 2021), Tecnick testset (Asuni & Giachetti, 2014), and JPEG-AI testset (JPEG-AI, 2020). ... As can be seen from Figure 3, Swin T transform consistently outperforms its convolutional counterpart; the RD-performance of Swin T-Hyperprior is on-par with Conv-Ch ARM, despite the simpler prior; Swin T-Ch ARM outperforms VTM-12.1 across a wide PSNR range.
Researcher Affiliation Industry Yinhao Zhu Yang Yang Taco Cohen Qualcomm AI Research {yinhaoz, yyangy, tacos}@qti.qualcomm.com
Pseudocode No The paper includes architectural diagrams (e.g., Figure 2, Figure 10-13) but no formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Training All image compression models are trained on the CLIC2020 training set. ... For P-frame compression models, we follow the training setup of SSF. Both Conv-SSF and Swin TSSF are trained on Vimeo-90k Dataset (Xue et al., 2019)... We evaluate image compression models on 4 datasets: Kodak (Kodak, 1999), CLIC2021 testset (CLIC, 2021), Tecnick testset (Asuni & Giachetti, 2014), and JPEG-AI testset (JPEG-AI, 2020).
Dataset Splits No The paper specifies training sets (CLIC2020, Vimeo-90k) and evaluation test sets (Kodak, CLIC2021, Tecnick, JPEG-AI, UVG, MCL-JCV) but does not explicitly provide details for a separate validation split with specific percentages or sample counts.
Hardware Specification Yes The models run with PyTorch 1.9.0 on a workstation with one RTX 2080 Ti GPU. ... evaluated on an Intel Core i9-9940 CPU @ 3.30GHz, averaged over 24 Kodak images. ... same host machine with Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz.
Software Dependencies Yes The models run with PyTorch 1.9.0 on a workstation with one RTX 2080 Ti GPU, with PyTorch 1.9.0 and Cuda toolkit 11.1.
Experiment Setup Yes Training All image compression models are trained on the CLIC2020 training set. Conv Hyperprior and Swin T-Hyperprior are trained with 2M batches. Each batch contains 8 random 256x256 crops from training images. Learning rate starts at 10-4 and is reduced to 10-5 at 1.8M step. ... To cover a wide range of rate and distortion, for each solution, we train 5 models with β {0.003, 0.001, 0.0003, 0.0001, 0.00003}. ... For P-frame compression models, ... trained on Vimeo-90k Dataset (Xue et al., 2019) for 1M steps with learning rate 10-4, batch size of 8, crop size of 256x256, followed by 50K steps of training with learning rate 10-5 and crop size 384x256.