Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks
Authors: Matthew Dutson, Nathan Labiosa, Yin Li, Mohit Gupta
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments validate our approach on several vision tasks including denoising (NAFNet), image enhancement (HDRNet), monocular depth (Depth Anything v2), and semantic segmentation (Deep Labv3+). Our method improves temporal stability and robustness against a range of image corruptions (including compression artifacts, noise, and adverse weather), while preserving or improving the quality of predictions. |
| Researcher Affiliation | Academia | Matthew Dutson, Nathan Labiosa, Yin Li, and Mohit Gupta University of Wisconsin Madison EMAIL |
| Pseudocode | No | The paper describes algorithms and methods using mathematical equations and textual descriptions (e.g., Section 5 'Designing Stabilization Adapters', Appendix D 'Method Details'), but it does not include a clearly labeled pseudocode block or algorithm section with structured steps formatted like code. |
| Open Source Code | No | At publication we will release our code with a detailed README. We use publicly-available datasets and well-documented open-source tools. Our codebase includes shell scripts for downloading and preparing the required datasets. |
| Open Datasets | Yes | We use the HDRNet model [18]... We generate training pairs by applying the local Laplacian filter to each frame of the Need for Speed (NFS) dataset [29]. ... We use the NAFNet model [9]... We again use the NFS dataset... We also include results for depth estimation with Depth Anything v2 [75]... For depth training, we use the Vision Sim framework [25] (Blender) to generate a dataset of simulated videos with ground-truth depth. ... We now evaluate robustness under adverse weather conditions. Specifically, we consider the rain and snow corruptions from the Robust Spring [58] dataset. ... We train and evaluate on the VIPER dataset [56]... |
| Dataset Splits | Yes | NFS contains 100 videos (380k frames) collected at 240 FPS; we randomly select 20 videos for validation and use the remaining 80 for training. ... The dataset consists of 50 indoor scenes containing ego motion and is rendered at 50 FPS. We randomly select 10 scenes for validation and use the rest for training. ... Robust Spring contains 10 rendered sequences (2000 total frames)... We randomly select 2 videos for validation and use the remaining 8 for training. ... The predefined training and validation splits contain 77 sequences (134097 frames) and 47 sequences (49815 frames), respectively. |
| Hardware Specification | Yes | We train and evaluate on a compute cluster largely using RTX A4500 GPUs. |
| Software Dependencies | No | The paper mentions several models (HDRNet, NAFNet, Depth Anything v2, Deep Labv3+), optimization methods (Adam optimizer [31]), and third-party tools (Torchvision Elastic Transform class, Vision Sim framework), but it does not specify version numbers for these software components or general programming environments like Python, PyTorch, or CUDA versions used in their implementation. |
| Experiment Setup | Yes | We train for 80 epochs (2k iterations per epoch), using the Adam optimizer [31], an MSE loss, and batches of 8 randomly sampled frames. The learning rate is initially set to 10 4 and is scaled by 0.1 after epochs 40 and 60. Stabilizers are trained for 20 epochs (4k iterations per epoch) using the Adam optimizer and our unified loss with δ = || ||2. The learning rate is initialized to 10 3 for the simple learned stabilizer and 10 4 for other variants, and is reduced by a factor of 10 after epochs 10 and 15. |