Entroformer: A Transformer-based Entropy Model for Learned Image Compression
Authors: Yichen Qian, Xiuyu Sun, Ming Lin, Zhiyu Tan, Rong Jin
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTAL RESULTS We evaluate the effects of our transformer-baed entropy model by calculating the rate distortion (RD) performance. Figure 5 shows the RD curves over the publicly available Kodak dataset (Kodak, 1993) by using peak signal-to-noise ratio (PSNR) as the image quality metric. As shown in the left part, our Entroformer with joint the hyperprior module and the context module outperforms the state-of-the-art CNNs methods by 5.2% and the BPG by 20.5% at low bit rates. |
| Researcher Affiliation | Industry | Yichen Qian Ming Lin Xiuyu Sun Alibaba Group Alibaba Group Alibaba Group Hangzhou, China Bellevue, WA, 98004, USA Hangzhou, China yichen.qyc@alibaba-inc.com ming.l@alibaba-inc.com xiuyu.sxy@alibaba-inc.com Zhiyu Tan Rong Jin Alibaba Group Alibaba Group Hangzhou, China Bellevue, WA, 98004, USA zhiyu.tzy@alibaba-inc.com jinrong.jr@alibaba-inc.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/damo-cv/entroformer. |
| Open Datasets | Yes | We choose 14886 images from Open Image (Krasin et al., 2017) as our training data. |
| Dataset Splits | No | The paper specifies training data and a test set (Kodak dataset) but does not explicitly detail the use or splitting of a separate validation set. |
| Hardware Specification | Yes | All models are trained for 300 epochs with a batchsize of 16 and a patch size of 384 384 on 16GB Tesla V100 GPU card. |
| Software Dependencies | No | The paper mentions 'Py Torch(Paszke et al., 2019)' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9, β2 = 0.999, ϵ = 1 10 8, and base learning rate= 1 10 4. When training transformers, it is standard practice to use a warmup phase at the beginning of learning, during which the learning rate increases from zero to its peak value (Vaswani et al., 2017). We use a warmup with 0.05 proportion of the total epochs. And then the learning rate decays stepwise for every 1/5 proportion epochs by a factor of 0.75. Gradient clipping is also helpful in the compression setup, which is set to 1.0. All models are trained for 300 epochs with a batchsize of 16 and a patch size of 384 384 on 16GB Tesla V100 GPU card. |