Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution
Authors: Qi Tang, Yao Zhao, Meiqin Liu, Jian Jin, Chao Yao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods. |
| Researcher Affiliation | Collaboration | Qi Tang1,2, Yao Zhao1,2, Meiqin Liu1,2 , Jian Jin3, Chao Yao4* 1Institute of Information Science, Beijing Jiaotong University, Beijing, China 2Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China 3Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore 4School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a direct statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We evaluate the performance of Semantic Lens on the benchmark datasets widely used in the field of video instance segmentation, You Tube-VIS (YTVIS) (Yang, Fan, and Xu 2019), available in three versions (2019, 2021 (Yang et al. 2021) and 2022 (Yang, Fan, and Xu 2022)). |
| Dataset Splits | Yes | YTVIS-19 consists of 2,238 high-resolution video clips for training and 302 for validation... The improved and extended successors, YTVIS-21 and YTVIS-22, share a training set that comprises 2,985 videos. Moreover, additional videos are included in YTVIS-21 for validation, nearly doubling the annotation quantity compared to its 2019 predecessor. |
| Hardware Specification | Yes | The model is implemented with Py Torch-2.0 and trained across 4 NVIDIA 3090 GPUs. |
| Software Dependencies | Yes | The model is implemented with Py Torch-2.0 |
| Experiment Setup | Yes | We train our model with five input frames (T = 5) sampled from the same video, and set the input patch size of LR frames as 64 64. For optimization, we use Adam W with β1 = 0.9, β2 = 0.99 and weight decay = 10 4. The learning rate is initialized to 2 10 4. The Charbonnier loss is applied on whole frames between the ground-truth IHR and restored frame ISR |