Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DIFFSSR: Stereo Image Super-resolution Using Differential Transformer

Authors: Dafeng Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets demonstrate that DIFFSSR outperforms state-of-the-art methods, including NAFSSR and Swin FIRSSR, in terms of both quantitative metrics and visual quality. Code is available at https://github.com/Zdafeng/DIFFSSR.
Researcher Affiliation Industry Dafeng Zhang Samsung R&D Institute China-Beijing (SRC-B) EMAIL
Pseudocode No The paper describes the methodology with equations and figures (e.g., Figure 3: The overall network architecture of DIFFSSR) but does not present structured pseudocode blocks or algorithms.
Open Source Code Yes Code is available at https://github.com/Zdafeng/DIFFSSR.
Open Datasets Yes Dataset. The training dataset for our proposed model consists of a combination of images from the Flickr1024 dataset [12] and the Middlebury dataset [24]. Specifically, we utilize 800 stereo image pairs from Flickr1024 and 60 pairs from Middlebury. Then, low-resolution (LR) images are created by applying bicubic downsampling to the HR images with scaling factors of 2 and 4. The resulting LR images are cropped into 32 96 patches with a stride of 16, and their HR counterparts undergo corresponding cropping. For testing, we employ a popular benchmark comprising 20 pairs of images from the KITTI 2012 dataset [25], 20 pairs of images from the KITTI 2015 dataset [26], 112 pairs of images from the Flickr1024 dataset [12], and 5 pairs of images from the Middlebury dataset [24].
Dataset Splits Yes The training dataset for our proposed model consists of a combination of images from the Flickr1024 dataset [12] and the Middlebury dataset [24]. Specifically, we utilize 800 stereo image pairs from Flickr1024 and 60 pairs from Middlebury. Then, low-resolution (LR) images are created by applying bicubic downsampling to the HR images with scaling factors of 2 and 4. The resulting LR images are cropped into 32 96 patches with a stride of 16, and their HR counterparts undergo corresponding cropping. For testing, we employ a popular benchmark comprising 20 pairs of images from the KITTI 2012 dataset [25], 20 pairs of images from the KITTI 2015 dataset [26], 112 pairs of images from the Flickr1024 dataset [12], and 5 pairs of images from the Middlebury dataset [24].
Hardware Specification Yes The model is trained on four NVIDIA Ge Force RTX 3090 GPU, and the training process is carefully monitored to ensure convergence and to adjust hyperparameters as necessary for optimal performance. ... We test our method on NVIDIA Ge Force RTX 3090 GPU with the resolution 32 96.
Software Dependencies No The paper does not provide specific version numbers for key software components or libraries used for its implementation.
Experiment Setup Yes The training process for DIFFSSR is conducted over 500,000 iterations with a batch size of 8. We initialize the learning rate at 2e-4 and employ a cosine annealing strategy [27] to gradually decrease the learning rate to 1e-7. Data augmentation techniques, including random horizontal and vertical flips and channel shuffle [4], are applied to enhance dataset diversity. Additionally, we employ a Charbonnier L1 loss function [28] to measure the difference between the super-resolved and ground-truth stereo images.