Is Overfitting Necessary for Implicit Video Representation?
Authors: Hee Min Choi, Hyoa Kang, Dokwan Oh
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on popular UVG benchmark show that random subnetworks obtained from our framework achieve higher reconstruction and visual quality than fully trained models with similar encoding sizes. |
| Researcher Affiliation | Industry | 1Samsung Advanced Institute of Technology, Samsung Electronics, Suwon, Republic of Korea. |
| Pseudocode | Yes | A pseudo code of training the proposed algorithm is described in Appendix A.1. |
| Open Source Code | No | The paper does not explicitly state that the source code for the methodology described is publicly available or provide a link to it. A link to a baseline (NeRV) code is provided but not for the authors' own work. |
| Open Datasets | Yes | Dataset: Following prior video INR methods (e.g., Chen et al., 2021a; Li et al., 2022), we demonstrate the effectiveness of our framework on UVG dataset (Mercat et al., 2020), a widely used benchmark for video compression. |
| Dataset Splits | No | The paper does not explicitly specify training, validation, and test dataset splits by percentages, counts, or references to predefined split files. |
| Hardware Specification | Yes | We used a single NVIDIA A100 GPU (80GB) and 4 batches throughout this experiment. |
| Software Dependencies | No | The paper mentions 'ffmpeg (Tomar, 2006)', 'Adam optimizer (Kingma & Ba, 2015)', and 'pytorch library' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Hyperparameters: For our experiments with UVG dataset (Mercat et al., 2020), hyperparameters b and l in the positional embedding (10) are set to be 1.25 and 80, respectively, and the output channel of the first layer in the MLP is 512. We use the upscale factors (5, 3, 2, 2, 2) in Ne RV blocks, and GELU (Hendrycks & Gimpel, 2016) activation as suggested by the authors. Training Details of the Proposed Framework: We train our framework for 200 epochs using Adam optimizer (Kingma & Ba, 2015) with a cosine learning rate scheduler and 4 batches on a single NVIDIA A100 GPU (80GB). Multiple learning rates ranging from 0.015 to 0.200 are swept over and the best results are reported. Throughout our experiments, networks are trained in full precision (FP32). We use 3 levels of supermasks with k1 = 0.2, and the other densities {kn}3 n=2 are chosen by linear method of Okoshi et al. (2022) unless specified. |