Frequency-Aware Transformer for Learned Image Compression
Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets. Code will be releaset at https://github.com/qingshi9974/ ICLR2024-FTIC |
| Researcher Affiliation | Academia | Han Li1, Shaohui Li2 , Wenrui Dai1 , Chenglin Li1, Junni Zou1, Hongkai Xiong1 1Shanghai Jiao Tong University, 2Tsinghua Shenzhen International Graduate School, Tsinghua University {qingshi9974,daiwenrui,lcl1985,zoujunni,xionghongkai}@sjtu.edu lishaohui@sz.tsinghua.edu.cn |
| Pseudocode | No | The paper provides detailed architectural diagrams and parameter tables (e.g., Figure 7, Table 4) but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | Code will be releaset at https://github.com/qingshi9974/ ICLR2024-FTIC |
| Open Datasets | Yes | We train the proposed FTIC models on the Flickr2W (Liu et al., 2020) and Image Net-1k (Deng et al., 2009) dataset for 3.2M steps with a batch size of 8. ...We evaluate our the proposed model on three benchmark datasets, i.e., Kodak image set (Kodak, 1993) with 24 images of 768 512 pixels, Tecnick testset (Asuni & Giachetti, 2014) with 100 images of 1200 1200 pixels, CLIC Professional Validation dataset (CLIC, 2021) with 41 images of at most 2K resolution. |
| Dataset Splits | No | The paper mentions training on Flickr2W and ImageNet-1k datasets and evaluating on Kodak, Tecnick, and CLIC. However, it does not specify the train/validation/test splits used for the training datasets (Flickr2W/ImageNet-1k) or any explicit validation procedure during training. |
| Hardware Specification | Yes | We use NVIDIA Ge Force RTX 4090 and Intel Xeon Platinum 8260 to conduct the following experiments. ...The experiments are conducted on a single NVIDIA Ge Force RTX 4090 with 24 GB memory. |
| Software Dependencies | No | The paper mentions using Adam optimizer and training details but does not specify software dependencies like PyTorch, TensorFlow, or CUDA with their version numbers. |
| Experiment Setup | Yes | We train the proposed FTIC models on the Flickr2W (Liu et al., 2020) and Image Net-1k (Deng et al., 2009) dataset for 3.2M steps with a batch size of 8. The model is optimized using Adam optimizer with the learning rate initialized as 1e-4. ...The Lagrangian multiplier used for training MSE-optimized models are {0.0025, 0.0035, 0.0067, 0.0130, 0.0250, 0.0483}, and those for MS-SSIM-optimized models are {2.40, 4.58, 8.73, 16.64, 31.73, 60.50}. ...The first stage is trained for 2M steps with a learning rate of 1e-4. Each batch contains 8 patches with the size of 256 256 randomly cropped from the training images. The second stage is trained for 1M steps with the same learning rate. Finally, we train the model with learning rate of 1e-5 for 200K steps using a larger crop size of 384 384. For all training, Adam optimizer is used without weighted decay. |