Offline and Online Optical Flow Enhancement for Deep Video Compression

Authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on two state-of-the-art deep video compression schemes, DCVC and DCVCDC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 13.4% bitrate saving for DCVC and 4.1% bitrate saving for DCVC-DC on the tested videos, without increasing the model or computational complexity of the decoder side.
Researcher Affiliation Academia University of Science and Technology of China {cbtang,xhsheng,zhuoyuanli,zhanghaotian}@mail.ustc.edu.cn, {lil1,dongeliu}@ustc.edu.cn
Pseudocode Yes Algorithm 1: Optical Flow Latent Updating in the Inference Stage
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it contain an explicit statement of code release or a link to a repository.
Open Datasets Yes We use BVI-DVC (Ma, Zhang, and Bull 2021) dataset for fine-tuning Spynet. ... The commonly-used Vimeo-90k (Xue et al. 2019) dataset is used for training DCVC and DCVC-DC in an end-to-end manner.
Dataset Splits No The paper mentions 'all the videos of training sets are randomly cropped into 256 × 256 patches' for training and 'We test 96 frames for each video' but does not specify a quantitative training/test/validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies Yes The motion vectors are extracted by VTM10.01. The Adam optimizer (Kingma and Ba 2014) is used. Cheng2020Anchor (Cheng et al. 2020) implemented by Compress AI (B egaint et al. 2020).
Experiment Setup Yes In the first stage, we set λME to 100, and fine-tune the Spynet using the extracted MV for 1,000,000 iterations. In the second stage, we deploy the enhanced Spynet into the video codec and train the whole video compression network for 5,000,000 iterations until converge. Finally, we set the updating times N in Algorithm 1 to 1500 according to the ablation study. The initial learning rate for the first two steps is 1e-4, then decrease to 5e-5 at the 800,000th iteration and 4,000,000th iteration respectively. The initial learning rate for online optimization is 5e-3, which is decreased by 50% at the 1200th iteration. The Adam optimizer (Kingma and Ba 2014) is used, and the batch size is set to 16 for the first training stage and 4 for the second training stage.