Neural Rate Control for Learned Video Compression
Authors: Yiwei Zhang, Guo Lu, Yunuo Chen, Shen Wang, Yibo Shi, Jing Wang, Li Song
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our approach can achieve accurate rate control with only 2% average bitrate error. Better yet, our method achieves nearly 10% bitrate savings compared to various baseline methods. |
| Researcher Affiliation | Collaboration | 1Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University 2Huawei Technologies, Beijing, China |
| Pseudocode | No | The paper provides network architecture diagrams but does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about open-sourcing its code or a link to a code repository. |
| Open Datasets | Yes | For training the rate implementation network, we used the Vimeo-90k dataset (Xue et al., 2019), containing 89,800 video clips. For the rate allocation network, we selected the BVI-DVC dataset (Ma et al., 2021) to leverage the rate-distortion loss of multiple frames. |
| Dataset Splits | No | The paper states 'The training times for the rate implementation and allocation networks are about 10 hours and 1 day, respectively.' and 'Both networks were trained over 200,000 steps, with a batch size of 4.' and 'we set the GOP size to 100 during the evaluation stage.' It specifies training steps and batch size but does not explicitly detail a validation split or how it was used. |
| Hardware Specification | No | The paper mentions 'When encoding a 1080P sequence, the inference times for these networks are just 2.95ms and 2.32ms, respectively.' but does not specify the hardware (e.g., GPU model, CPU type) used for these experiments. |
| Software Dependencies | No | The paper mentions reimplementing DVC, FVC, DCVC, and Alpha VC as baseline models and using a method from Lin et al. (2021), but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We trained the network using randomly cropped 256 256 patches from these video sequences. Both networks were trained over 200,000 steps, with a batch size of 4. The learning rate starts at 1e-4, reducing to 1e-5 after 120,000 steps. We set the GOP size to 100 during the evaluation stage. |