NVRC: Neural Video Representation Compression
Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 23% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. |
| Researcher Affiliation | Collaboration | Ho Man Kwan , Ge Gao , Fan Zhang , Andrew Gower , David Bull Visual Information Lab, University of Bristol, UK Immersive Content & Comms Research, BT, UK {hm.kwan, ge1.gao, fan.zhang, dave.bull}@bristol.ac.uk, andrew.p.gower@bt.com |
| Pseudocode | No | The paper describes algorithms and processes in descriptive text and mathematical formulations but does not include formal pseudocode blocks or sections explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We will release the code and data publicly available to support the reproducibility of the results. |
| Open Datasets | Yes | To evaluate the performance of the proposed NVRC framework, we conducted experiments on the UVG [45] and MCL-JCV [59] dataset. The UVG dataset includes 7 video sequences with 300/600 frames, while the MCL-JCV dataset consists of 30 video clips with 120-150 frames. All sequences are compressed at their original resolution in this experiment. We also provide the result of JVET-CTC dataset Class B [9] in the Appendix. |
| Dataset Splits | No | The paper mentions training procedures and epochs, and the use of 'validation' in the context of the NeurIPS checklist question regarding reproducibility. However, it does not explicitly provide details about a validation dataset split (e.g., specific percentages or sample counts used for validation data partitioning) within its experimental setup description. |
| Hardware Specification | Yes | The complexity figures are calculated based on NVIDIA RTX 4090 GPU with FP16. |
| Software Dependencies | No | The paper mentions using conventional codecs like 'x265', 'HM-18.0', and 'VTM-20.0' for benchmarking and implies the use of frameworks like 'Hi Ne RV' (an INR network architecture integrated), but it does not specify version numbers for general software dependencies like programming languages (e.g., Python), deep learning libraries (e.g., PyTorch, TensorFlow), or CUDA versions used for their own implementation. |
| Experiment Setup | Yes | The model is trained for 360 or 720 epochs in the first stage and 30 or 60 epochs in the second stage... The training is performed by sampling patches from the target video, where the patch size is 120 × 120, and the batch size of each training step is 144 patches (equal to 1 frame)... The learning rates in Stage 1 and Stage 2 are 2e-3 (or 1e-3 for rare case which the training is less stable) and 1e-4, where the cosine decay is applied for scaling the learning rate with a minimum learning rate of 1e-4 and 1e-5, respectively. The norm clipping with 1.0 is applied. L2 regularization of 1e-6 is applied... For the soft-rounding and the Kumaraswamy noise [29] in Stage 1, the temperatures and the noise scale ratio scale from 0.5 to 0.3 and 2.0 to 1.0, respectively. For Quant-Noise [53] in Stage 2, the noise ratio scales from 0.5 to 1.0. R is optimized once for every 8 steps of D. |