MIMT: Masked Image Modeling Transformer for Video Compression
Authors: Jinxi Xiang, Kuan Tian, Jun Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the proposed MIMT framework equipped with the new transformer entropy model achieves state-of-the-art performance on HEVC, UVG, and MCLJCV datasets, generally outperforming the VVC in terms of PSNR and SSIM. |
| Researcher Affiliation | Industry | Tencent AI Lab, Shenzhen {jinxixang,kuantian,junejzhang,haroldhan,willyang}@tencent.com |
| Pseudocode | Yes | Algorithm 1: MIMT Iterative Decoding |
| Open Source Code | No | The paper states: 'We obtain the opensource code of DVC 1, SSF 2, DCVC 3, and DMC 4 for decoding efficiency comparison.' However, it does not explicitly state that the source code for the proposed MIMT model is publicly available. |
| Open Datasets | Yes | We use Vimeo-90k (Xue et al., 2019) for training. The test videos include HEVC Class B, UVG (Mercat et al., 2020), and MCL-JCV (Wang et al., 2016) datasets. |
| Dataset Splits | No | The paper mentions 'The test videos include HEVC Class B, UVG (Mercat et al., 2020), and MCL-JCV (Wang et al., 2016) datasets.' but does not explicitly define training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | We set the batch size as 8, using the Adam optimizer on a single V100 GPU. |
| Software Dependencies | No | The paper describes the model architecture and training process but does not specify software dependencies with version numbers (e.g., specific Python, library, or framework versions). |
| Experiment Setup | Yes | We set the Go P size as 32 for all datasets and use learned model (Cheng et al., 2020) for I frame compression. We train four models with different λ values {256, 512, 1024, 2048}. By default, we train models with MSE loss. When using the MS-SSIM metric, the model is fine-tuned with the MS-SSIM loss. We can apply multi-frame (up to 7 frames) and patch-size (256 256) for training. In the first stage, we use two consecutive frames... to train our model for 1 M steps. Then we add the MIMT entropy model... for 1 M steps. Finally, we extend the length... to 7 frames for 300 K steps. The learning rate is set as 5e-5. We set the batch size as 8, using the Adam optimizer on a single V100 GPU. |