EVC: Towards Real-Time Neural Image Compression with Mask Decay
Authors: Wang Guo-Hua, Jiahao Li, Bin Li, Yan Lu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS Training & Testing. The training dataset is the training part of Vimeo-90K (Xue et al., 2019) septuplet dataset. For a fair comparison, all models are trained with 200 epochs. ... Testing datasets contain Kodak (Franzen, 1999), HEVC test sequences (Sharman & Suehring, 2017), Tecnick (Asuni & Giachetti, 2014). BD-Rate (Bjontegaard, 2001) for peak signal-to-noise ratio (PSNR) versus bits-per-pixel (BPP) is our main metric to compare different models. |
| Researcher Affiliation | Collaboration | Guo-Hua Wang1 , Jiahao Li2, Bin Li2, Yan Lu2 1State Key Laboratory for Novel Software Technology, Nanjing University 2Microsoft Research Asia wangguohua@lamda.nju.edu.cn, {li.jiahao,libin,yanlu}@microsoft.com |
| Pseudocode | No | The paper describes algorithms and methods but does not present any formal pseudocode blocks or sections labeled “Algorithm”. |
| Open Source Code | Yes | Our code is at https://github.com/microsoft/DCVC. |
| Open Datasets | Yes | The training dataset is the training part of Vimeo-90K (Xue et al., 2019) septuplet dataset. ... Testing datasets contain Kodak (Franzen, 1999), HEVC test sequences (Sharman & Suehring, 2017), Tecnick (Asuni & Giachetti, 2014). |
| Dataset Splits | No | The paper states: “During training, the raw image is randomly cropped into 256 256 patches. For a fair comparison, all models are trained with 200 epochs.” It does not explicitly mention “validation” splits with percentages or counts, or specific pre-defined validation sets. It talks about “Training dataset” and “Testing datasets”. While it uses “finetuned for a few epochs” after mask decay, it doesn't specify a validation split for this. |
| Hardware Specification | Yes | We test the latency on two computers. One is equipped with one 2080Ti GPU and two Intel(R) Xeon(R) E5-2630 v3 CPUs. Another is equipped with one A100 GPU and one AMD Epyc 7V13 CPU. ... All models are trained on a computer with 8 V100 GPUs. |
| Software Dependencies | Yes | We use Compress AI-1.1.02 to gather evaluation results. We install these two softwares according to the official instructions. The default configuration of VTM is used. All results are gathered by running the following command: python m compressai.utils.bench vtm [path to image folder] c [path to VTM folder]/cfg/encoder intra vtm.cfg b [path to VTM folder]/bin j 8 q 24 26 28 30 32 34 36 38 40 42 |
| Experiment Setup | Yes | The training dataset is the training part of Vimeo-90K (Xue et al., 2019) septuplet dataset. During training, the raw image is randomly cropped into 256 256 patches. For a fair comparison, all models are trained with 200 epochs. For our method, it cost 60 epochs for the teacher to transform into the student with mask decay, then the student is finetuned by 140 epochs. Adam W is used as the optimizer with batch size 16. The initial learning rate is 2e 4, and decays by 0.5 at 50, 90, 130, 170 epochs. The default decay rate η for our mask decay is 4e 5. |