Flow-Guided Sparse Transformer for Video Deblurring
Authors: Jing Lin, Yuanhao Cai, Xiaowan Hu, Haoqian Wang, Youliang Yan, Xueyi Zou, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and yields visually pleasant results in real video deblurring. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate School, Tsinghua University 2Huawei Noah s Ark Lab 3ETH Z urich. |
| Pseudocode | No | The paper describes the proposed method in text and figures but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | https:// github.com/linjing7/VR-Baseline |
| Open Datasets | Yes | DVD. The DVD (Su et al., 2017) dataset consists of 71 videos with 6,708 blurry-sharp image pairs. |
| Dataset Splits | No | The paper explicitly states train/test splits for the datasets (DVD: 61 videos train, 10 videos test; GOPRO: 2:1 train/test) but does not mention a distinct validation split. |
| Hardware Specification | Yes | The models are trained with 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'SPy Net' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We implement FGST in Py Torch. We adopt a pre-trained SPy Net (Ranjan et al., 2017) as the optical flow estimator. All the modules are trained with the Adam (Kingma & Ba, 2015) optimizer (β1 = 0.9 and β2 = 0.999) for 600 epochs. The initial learning rate is set to 2 × 10−4 and 2.5 × 10−5 respectively for the deblurring model and optical flow estimator. The learning rate is halved every 200 epochs during the training procedure. Patches at the size of 256 × 256 cropped from training frames are fed into the models. The batch size is 8. The temporal radius r of the neighboring frames is set to 1. The sequence length T is set to 9 in training and the whole video length in testing. The horizontal and vertical flips are performed for data augmentation. Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) (Wang et al., 2004) are adopted as the evaluation metrics. The models are trained with 8 V100 GPUs. L1 loss between the restored and GT videos is used for supervision. |