Scalable Neural Video Representations with Learnable Positional Features

Authors: Subin Kim, Sihyun Yu, Jaeho Lee, Jinwoo Shin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify the effectiveness of our method on the popular UVG benchmark [32]. In particular, NVP achieves the peak signal-to-noise ratio (PSNR; higher is better) metric of 34.57 in 5 minutes (with a single NVIDIA V100 32GB GPU): it is achieved >2 times faster, even with using >8 times fewer parameters than the state-of-the-art on compute-efficiency that reaches 34.07 in 10 minutes. Moreover, compared with prior arts on encoding quality, our method improves the learned perceptual image patch similarity (LPIPS [58]; lower is better) as 0.145 0.102 (+29.7%) with a similar number of parameters while requiring 72.5% less training time. 4 Experiments
Researcher Affiliation Academia 1Korea Advanced Institute of Science and Technology (KAIST) 2Pohang University of Science and Technology (POSTECH)
Pseudocode No The paper describes its architecture and procedures in detail but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Section 4 and supplementary material.
Open Datasets Yes We verify the effectiveness of our framework on UVG-HD [32], a representative benchmark for evaluating video encodings. UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In Proceedings of the 11th ACM Multimedia Systems Conference, MMSys 20, page 297 302, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450368452. doi: 10.1145/3339825.3394937.
Dataset Splits No The paper mentions evaluating on specific videos (e.g., UVG-HD, Big Buck Bunny) and processing times, but does not explicitly provide details about train/validation/test dataset splits (e.g., percentages, sample counts, or specific split files) for reproduction.
Hardware Specification Yes All main experiments, including baselines, are processed with a single GPU (NVIDIA V100 32GB) and 28 instances from a virtual CPU (Intel Xeon Platinum 8168 CPU @ 2.70GHz).
Software Dependencies No The paper mentions using PyTorch [36] but does not specify its version number or any other software dependencies with explicit version details.
Experiment Setup No The paper describes architectural components and variations (e.g., NVP-S and NVP-L with different latent code dimensions) and discusses training times, but it does not explicitly provide specific numerical hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed training configurations in the main text. It defers some details to Appendix A.2.