Recurrent Video Restoration Transformer with Guided Deformable Attention
Authors: Jingyun Liang, Yuchen Fan, Xiaoyu Xiang, Rakesh Ranjan, Eddy Ilg, Simon Green, Jiezhang Cao, Kai Zhang, Radu Timofte, Luc V Gool
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime. The codes are available at https://github.com/Jingyun Liang/RVRT. |
| Researcher Affiliation | Collaboration | 1Computer Vision Lab, ETH Zurich, Switzerland 2Meta Inc. 3University of Wurzburg, Germany |
| Pseudocode | No | The paper describes the methodology using text and diagrams but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github.com/Jingyun Liang/RVRT. |
| Open Datasets | Yes | For video SR, we consider two settings: bicubic (BI) and blur-downsampling (BD) degradation. For BI degradation, we train the model on two different datasets: REDS [53] and Vimeo-90K [87]... |
| Dataset Splits | No | The paper mentions training on REDS and Vimeo-90K, and testing on their respective test sets (REDS4, Vimeo-90K-T), but does not explicitly specify a separate validation dataset split. |
| Hardware Specification | No | The paper reports model size, testing memory, and runtime, but does not specify the hardware (e.g., GPU/CPU models, memory) on which the experiments were conducted. |
| Software Dependencies | No | The paper mentions using specific components like Charbonnier loss [12], Adam optimizer [33], Cosine Annealing scheme [52], and Spy Net [58, 56], but does not provide specific version numbers for any software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | For shallow feature extraction and HQ frame reconstruction, we use 1 RSTB that has 2 swin transformer layers. For recurrent feature refinement, we use 4 refinement modules with a clip size of 2, each of which has 2 MRSTBs with 2 modified swin transformer layers. For both RSTB and MRSTB, spatial attention window size and head number are 8 8 and 6, respectively. We use 144 channels for video SR and 192 channels for deblurring and denoising. In GDA, we use 12 deformable groups and 12 deformable heads with 9 candidate locations... In training, we randomly crop 256 256 HQ patches and use different video lengths for different datasets... Adam optimizer [33] with default setting is used to train the model for 600,000 iterations when the batch size is 8. The learning rate is initialized as 4 10 4 and deceased with the Cosine Annealing scheme [52]. To stabilize training, we initialize Spy Net [58, 56] with pretrained weights, fix it for the first 30,000 iterations and reduce its learning rate by 75%. |