Hierarchical Autoregressive Modeling for Neural Video Compression

Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.
Researcher Affiliation Academia Department of Computer Science, UC Irvine1 Computation & Neural Systems, California Institute of Technology2
Pseudocode Yes pseudocode is available in Appendix A.3. Algorithm 1: An efficient algorithm to build a scale-space 3D tensor
Open Source Code No We release You Tube-NT in the form of customizable scripts to facilitate future compression research. (Footnote: 1https://github.com/privateyoung/Youtube-NT). This link is for the dataset generation scripts, not the main model's source code.
Open Datasets Yes Vimeo-90k (Xue et al., 2019) consists of 90,000 clips... You Tube-NT. This is our new dataset. We collected 8,000 nature videos and movie/video-game trailers from youtube.com and processed them into 300k high-resolution (720p) clips, which we refer to as You Tube-NT. We release You Tube-NT in the form of customizable scripts to facilitate future compression research.
Dataset Splits No The paper does not explicitly mention a validation dataset split (e.g., 80/10/10 split or specific counts for validation set).
Hardware Specification Yes Training time is about four days on an NVIDIA Titan RTX.
Software Dependencies No The paper mentions 'Adam optimizer (Kingma & Ba, 2015)' and 'ffmpeg' commands but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages.
Experiment Setup Yes All models are trained on three consecutive frames and batchsize 8, which are randomly selected from each clip, then randomly cropped to 256x256. We trained on MSE loss, following similar procedure to Agustsson et al. (2020) (see Appendix A.2 for details). We use the Adam optimizer (Kingma & Ba, 2015), training the models for 1,050,000 steps. The initial learning rate of 1e-4 is decayed to 1e-5 after 900,000 steps, and we increase the crop size to 384x384 for the last 50,000 steps.