reproducibilityindex.ai

MIMT: Masked Image Modeling Transformer for Video Compression

Authors: Jinxi Xiang, Kuan Tian, Jun Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the proposed MIMT framework equipped with the new transformer entropy model achieves state-of-the-art performance on HEVC, UVG, and MCLJCV datasets, generally outperforming the VVC in terms of PSNR and SSIM.
Researcher Affiliation	Industry	Tencent AI Lab, Shenzhen {jinxixang,kuantian,junejzhang,haroldhan,willyang}@tencent.com
Pseudocode	Yes	Algorithm 1: MIMT Iterative Decoding
Open Source Code	No	The paper states: 'We obtain the opensource code of DVC 1, SSF 2, DCVC 3, and DMC 4 for decoding efficiency comparison.' However, it does not explicitly state that the source code for the proposed MIMT model is publicly available.
Open Datasets	Yes	We use Vimeo-90k (Xue et al., 2019) for training. The test videos include HEVC Class B, UVG (Mercat et al., 2020), and MCL-JCV (Wang et al., 2016) datasets.
Dataset Splits	No	The paper mentions 'The test videos include HEVC Class B, UVG (Mercat et al., 2020), and MCL-JCV (Wang et al., 2016) datasets.' but does not explicitly define training, validation, and test dataset splits with percentages or counts.
Hardware Specification	Yes	We set the batch size as 8, using the Adam optimizer on a single V100 GPU.
Software Dependencies	No	The paper describes the model architecture and training process but does not specify software dependencies with version numbers (e.g., specific Python, library, or framework versions).
Experiment Setup	Yes	We set the Go P size as 32 for all datasets and use learned model (Cheng et al., 2020) for I frame compression. We train four models with different λ values {256, 512, 1024, 2048}. By default, we train models with MSE loss. When using the MS-SSIM metric, the model is fine-tuned with the MS-SSIM loss. We can apply multi-frame (up to 7 frames) and patch-size (256 256) for training. In the first stage, we use two consecutive frames... to train our model for 1 M steps. Then we add the MIMT entropy model... for 1 M steps. Finally, we extend the length... to 7 frames for 300 K steps. The learning rate is set as 5e-5. We set the batch size as 8, using the Adam optimizer on a single V100 GPU.