reproducibilityindex.ai

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNe RV and 43.4% over DCVC on the UVG dataset, measured in PSNR).
Researcher Affiliation	Collaboration	Ho Man Kwan , Ge Gao , Fan Zhang , Andrew Gower , David Bull Visual Information Lab, University of Bristol, UK Immersive Content & Comms Research, BT, UK {hm.kwan, ge1.gao, fan.zhang, dave.bull}@bristol.ac.uk, andrew.p.gower@bt.com
Pseudocode	No	No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	1Project page: https://hmkx.github.io/hinerv/ (The paper refers to a project page, but does not explicitly state that the source code is released or provide a direct link to a code repository within the paper itself.)
Open Datasets	Yes	The proposed method has been tested against existing INR-based video coding methods and state-of-the-art conventional and learning-based video codecs on the UVG [36] and MCL-JCV [51] datasets.
Dataset Splits	No	The paper mentions training on datasets and evaluating on test databases (UVG and MCL-JCV), but does not explicitly state the specific training/validation/test dataset splits, percentages, or sample counts.
Hardware Specification	Yes	We reported the encoding and decoding speeds in frames per second, measured with A100 GPU.
Software Dependencies	Yes	compared Hi Ne RV with... two conventional codecs: HEVC/H.265 HM 18.0 (Main Profile with Random Access) [39, 40] and x265 (veryslow preset with B frames) [2]
Experiment Setup	Yes	For all models tested, we set the number of training epochs to 300 and batch size (in video frames) to 1. ... For Hi Ne RV, we empirically found that it is marginally better to adopt a larger learning rate of 2e 3 with global norm clipping... We used the learning rate of 5e 4 which is a common choice for the other networks... We also adopted a combination of ℓ1 loss and MS-SSIM loss (with a small window size of 5 5 rather than 11 11) for Hi Ne RV... we prune these three models to remove 15% of their weights and fine-tune the models for another 60 epochs. These models are further optimized with Quant-Noise [46] with 90% noise ratio for 30 epochs. Here we use the same learning rate scheduling for fine-tuning, but employ 10% of the original learning rate in the optimization with Quant-Noise.