Neural Rate Control for Learned Video Compression

Authors: Yiwei Zhang, Guo Lu, Yunuo Chen, Shen Wang, Yibo Shi, Jing Wang, Li Song

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our approach can achieve accurate rate control with only 2% average bitrate error. Better yet, our method achieves nearly 10% bitrate savings compared to various baseline methods.
Researcher Affiliation Collaboration 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2Huawei Technologies, Beijing, China
Pseudocode No The paper provides network architecture diagrams but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement about open-sourcing its code or a link to a code repository.
Open Datasets Yes For training the rate implementation network, we used the Vimeo-90k dataset (Xue et al., 2019), containing 89,800 video clips. For the rate allocation network, we selected the BVI-DVC dataset (Ma et al., 2021) to leverage the rate-distortion loss of multiple frames.
Dataset Splits No The paper states 'The training times for the rate implementation and allocation networks are about 10 hours and 1 day, respectively.' and 'Both networks were trained over 200,000 steps, with a batch size of 4.' and 'we set the GOP size to 100 during the evaluation stage.' It specifies training steps and batch size but does not explicitly detail a validation split or how it was used.
Hardware Specification No The paper mentions 'When encoding a 1080P sequence, the inference times for these networks are just 2.95ms and 2.32ms, respectively.' but does not specify the hardware (e.g., GPU model, CPU type) used for these experiments.
Software Dependencies No The paper mentions reimplementing DVC, FVC, DCVC, and Alpha VC as baseline models and using a method from Lin et al. (2021), but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We trained the network using randomly cropped 256 256 patches from these video sequences. Both networks were trained over 200,000 steps, with a batch size of 4. The learning rate starts at 1e-4, reducing to 1e-5 after 120,000 steps. We set the GOP size to 100 during the evaluation stage.