Is Overfitting Necessary for Implicit Video Representation?

Authors: Hee Min Choi, Hyoa Kang, Dokwan Oh

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on popular UVG benchmark show that random subnetworks obtained from our framework achieve higher reconstruction and visual quality than fully trained models with similar encoding sizes.
Researcher Affiliation Industry 1Samsung Advanced Institute of Technology, Samsung Electronics, Suwon, Republic of Korea.
Pseudocode Yes A pseudo code of training the proposed algorithm is described in Appendix A.1.
Open Source Code No The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to it. A link to a baseline (NeRV) code is provided but not for the authors' own work.
Open Datasets Yes Dataset: Following prior video INR methods (e.g., Chen et al., 2021a; Li et al., 2022), we demonstrate the effectiveness of our framework on UVG dataset (Mercat et al., 2020), a widely used benchmark for video compression.
Dataset Splits No The paper does not explicitly specify training, validation, and test dataset splits by percentages, counts, or references to predefined split files.
Hardware Specification Yes We used a single NVIDIA A100 GPU (80GB) and 4 batches throughout this experiment.
Software Dependencies No The paper mentions 'ffmpeg (Tomar, 2006)', 'Adam optimizer (Kingma & Ba, 2015)', and 'pytorch library' but does not provide specific version numbers for these software components.
Experiment Setup Yes Hyperparameters: For our experiments with UVG dataset (Mercat et al., 2020), hyperparameters b and l in the positional embedding (10) are set to be 1.25 and 80, respectively, and the output channel of the first layer in the MLP is 512. We use the upscale factors (5, 3, 2, 2, 2) in Ne RV blocks, and GELU (Hendrycks & Gimpel, 2016) activation as suggested by the authors. Training Details of the Proposed Framework: We train our framework for 200 epochs using Adam optimizer (Kingma & Ba, 2015) with a cosine learning rate scheduler and 4 batches on a single NVIDIA A100 GPU (80GB). Multiple learning rates ranging from 0.015 to 0.200 are swept over and the best results are reported. Throughout our experiments, networks are trained in full precision (FP32). We use 3 levels of supermasks with k1 = 0.2, and the other densities {kn}3 n=2 are chosen by linear method of Okoshi et al. (2022) unless specified.