Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DeNC: Unleash Neural Codecs in Video Streaming with Diffusion Enhancement
Authors: Qihua Zhou, Ruibin Li, Jingcai Guo, Yaodong Huang, Zhenda Xu, Laizhong Cui, Song Guo
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Real-world evaluations show that De NC improves compression ratios with nearly an order of magnitude and achieves much higher restoration quality (e.g., 93+ VMAF and 23% higher MOS) over the latest baselines. ... Extensive experiments on public cloud services with You Tube videos show that De NC significantly improves compression ratios with nearly an order of magnitude and achieves much higher restoration quality, e.g., 23% higher mean opinion score (MOS), over the latest methods. |
| Researcher Affiliation | Academia | 1College of Computer Science and Software Engineering, Shenzhen University 2Department of Computing, The Hong Kong Polytechnic University 3Department of Computer Science and Engineering, The Hong Kong University of Science and Technology EMAIL EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Training diffusion model θ until convergence ... Algorithm 2: Inference in T steps for frame restoration |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for their method, nor does it provide a link to a code repository. |
| Open Datasets | No | All restoration models used in baselines and De NC are trained under the same videos collected from You Tube and Netflix. ... We inspect the compression ratios achieved by De NC and other baselines under different video settings. |
| Dataset Splits | No | The paper mentions training models and using videos from YouTube and Netflix, but it does not provide specific details on how these videos were split into training, validation, and test sets (e.g., percentages, sample counts, or predefined splits). |
| Hardware Specification | Yes | For example, enhancing a 10-second video only takes 780ms when using an NVIDIA 4090 GPU, i.e., lower than 8.3% additional time cost is incurred. |
| Software Dependencies | No | The paper mentions using a UNet backbone and Adam optimizer with specific configurations (e.g., learning rate, batch size) but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | The number of base feature channels is 64. To capture the time sequence information, the diffusion step index t is specified by adding the sinusoidal position embedding into each residual block. Given a maximum step number T, we control the noise variance βt (t [1, T]) through a linear quadratic scheduler, which gradually ranges from β1 = 10 4 to βT = 0.02. Also, we adopt the Mean of Squared Error (MSE) loss with Adam optimizer (Kingma and Ba 2015) and 16 batch size to train the diffusion model. The total number of training epochs is 10K and the initial learning rate is 8 10 5. |