Offline and Online Optical Flow Enhancement for Deep Video Compression
Authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on two state-of-the-art deep video compression schemes, DCVC and DCVCDC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 13.4% bitrate saving for DCVC and 4.1% bitrate saving for DCVC-DC on the tested videos, without increasing the model or computational complexity of the decoder side. |
| Researcher Affiliation | Academia | University of Science and Technology of China {cbtang,xhsheng,zhuoyuanli,zhanghaotian}@mail.ustc.edu.cn, {lil1,dongeliu}@ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1: Optical Flow Latent Updating in the Inference Stage |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it contain an explicit statement of code release or a link to a repository. |
| Open Datasets | Yes | We use BVI-DVC (Ma, Zhang, and Bull 2021) dataset for fine-tuning Spynet. ... The commonly-used Vimeo-90k (Xue et al. 2019) dataset is used for training DCVC and DCVC-DC in an end-to-end manner. |
| Dataset Splits | No | The paper mentions 'all the videos of training sets are randomly cropped into 256 × 256 patches' for training and 'We test 96 frames for each video' but does not specify a quantitative training/test/validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | Yes | The motion vectors are extracted by VTM10.01. The Adam optimizer (Kingma and Ba 2014) is used. Cheng2020Anchor (Cheng et al. 2020) implemented by Compress AI (B egaint et al. 2020). |
| Experiment Setup | Yes | In the first stage, we set λME to 100, and fine-tune the Spynet using the extracted MV for 1,000,000 iterations. In the second stage, we deploy the enhanced Spynet into the video codec and train the whole video compression network for 5,000,000 iterations until converge. Finally, we set the updating times N in Algorithm 1 to 1500 according to the ablation study. The initial learning rate for the first two steps is 1e-4, then decrease to 5e-5 at the 800,000th iteration and 4,000,000th iteration respectively. The initial learning rate for online optimization is 5e-3, which is decreased by 50% at the 1200th iteration. The Adam optimizer (Kingma and Ba 2014) is used, and the batch size is set to 16 for the first training stage and 4 for the second training stage. |