On the choice of Perception Loss Function for Learned Video Compression

Authors: Sadaf Salehkalaibar, Truong Buu Phan, Jun Chen, Wei Yu, Ashish Khisti

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. ... We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets.
Researcher Affiliation Academia Sadaf Salehkalaibar ECE Department University of Toronto sadafs@ece.utoronto.ca Buu Phan* ECE Department University of Toronto truong.phan@mail.utoronto.ca Jun Chen ECE Department Mc Master University chenjun@mcmaster.ca Wei Yu ECE Department University of Toronto weiyu@ece.utoronto.ca Ashish Khisti ECE Department University of Toronto akhisti@ece.utoronto.ca
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Code will be available at https://github.com/truongbuu/URDP_flow.
Open Datasets Yes We validate our results using ... deep-learning based experiments on moving MNIST and KTH datasets. ... Moving MNIST dataset [29] (with 1 digit) using Wasserstein GAN [30] ... Additional results on the KTH dataset [31] are available in Appendix J.3.
Dataset Splits No The paper mentions 'training set contains 60000 images' but does not provide specific train/validation/test splits or a clear splitting methodology.
Hardware Specification Yes Training takes 2 days per model on a single NVIDIA P100 GPU.
Software Dependencies No The paper mentions software like 'Wasserstein GAN', 'scale-space flow model', and 'conditional module', and 'WGAN-GP framework', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use a batch size of 64, RMSProp optimizer with a learning rate of 5 10 5, and train each model with 360 epochs, where the training set contains 60000 images. ... Under WGAN-GP framework [30], we use the gradient penalty of 10 and update the encoders/decoders for every 5 iterations. The parameters λ controlling the tradeoff are in Table.7.