Learning Binary Residual Representations for Domain-Specific Video Streaming

Authors: Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that our pipeline yields consistent gains over standard H.264 compression across several benchmark datasets while using the same channel bandwidth. ... We conduct extensive experiments to verify our hypothesis that we can improve state-of-the-art video compression methods by learning to encode the domain-specific residual information in a binary form. We also compare various ways of training the proposed binary autoencoder for encoding the residual information. On the KITTI (Geiger, Lenz, and Urtasun 2012) and three games video datasets, we show that our method consistently outperforms H.264 both quantitatively and qualitatively.
Researcher Affiliation Collaboration Yi-Hsuan Tsai,1 Ming-Yu Liu,2 Deqing Sun,2 Ming-Hsuan Yang,1,2 Jan Kautz2 1University of California, Merced 2NVIDIA
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide concrete access to source code for the methodology. The provided link 'http://research.nvidia.com/publication/201802 Learning Binary Residual' is stated to contain 'more visualization results', not source code.
Open Datasets Yes We evaluate our pipeline using the KITTI (Geiger, Lenz, and Urtasun 2012) dataset, which consists of various driving sequences of street scenes, and three popular video games: Assassins Creed, Skyrim and Borderlands. ... We use the tracking benchmark on the KITTI dataset that contains 50 street-view videos.
Dataset Splits No The paper states: 'We randomly select 42 videos for training and 8 videos for testing.' While it specifies training and testing splits, it does not explicitly mention a separate validation dataset split.
Hardware Specification Yes On the server side, our encoder and the binarizer takes about 0.001 seconds to compress the residual image with a resolution of 360 × 1200 using a Titan X GPU. The decoder on the client side takes 0.001 seconds to reconstruct the residual image.
Software Dependencies No The implementation is based on Py Torch. The paper mentions PyTorch but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes Throughout the paper, we use Adam (Kingma and Ba 2015) to train our binary residual autoencoder. The learning rate is set to 10−3 and then decreased by half for every 5 epochs. The momentums are set to 0.9 and 0.999. The batch size is 10, and we train the model for 50 epochs.