Hierarchical Autoregressive Modeling for Neural Video Compression
Authors: Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods. |
| Researcher Affiliation | Academia | Department of Computer Science, UC Irvine1 Computation & Neural Systems, California Institute of Technology2 |
| Pseudocode | Yes | pseudocode is available in Appendix A.3. Algorithm 1: An efficient algorithm to build a scale-space 3D tensor |
| Open Source Code | No | We release You Tube-NT in the form of customizable scripts to facilitate future compression research. (Footnote: 1https://github.com/privateyoung/Youtube-NT). This link is for the dataset generation scripts, not the main model's source code. |
| Open Datasets | Yes | Vimeo-90k (Xue et al., 2019) consists of 90,000 clips... You Tube-NT. This is our new dataset. We collected 8,000 nature videos and movie/video-game trailers from youtube.com and processed them into 300k high-resolution (720p) clips, which we refer to as You Tube-NT. We release You Tube-NT in the form of customizable scripts to facilitate future compression research. |
| Dataset Splits | No | The paper does not explicitly mention a validation dataset split (e.g., 80/10/10 split or specific counts for validation set). |
| Hardware Specification | Yes | Training time is about four days on an NVIDIA Titan RTX. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2015)' and 'ffmpeg' commands but does not provide specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages. |
| Experiment Setup | Yes | All models are trained on three consecutive frames and batchsize 8, which are randomly selected from each clip, then randomly cropped to 256x256. We trained on MSE loss, following similar procedure to Agustsson et al. (2020) (see Appendix A.2 for details). We use the Adam optimizer (Kingma & Ba, 2015), training the models for 1,050,000 steps. The initial learning rate of 1e-4 is decayed to 1e-5 after 900,000 steps, and we increase the crop size to 384x384 for the last 50,000 steps. |