Semantic Lens: Instance-Centric Semantic Alignment for Video Super-resolution

Authors: Qi Tang, Yao Zhao, Meiqin Liu, Jian Jin, Chao Yao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of our model over existing state-of-the-art VSR methods.
Researcher Affiliation Collaboration Qi Tang1,2, Yao Zhao1,2, Meiqin Liu1,2 , Jian Jin3, Chao Yao4* 1Institute of Information Science, Beijing Jiaotong University, Beijing, China 2Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China 3Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore 4School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a direct statement or link for open-source code for the methodology described.
Open Datasets Yes We evaluate the performance of Semantic Lens on the benchmark datasets widely used in the field of video instance segmentation, You Tube-VIS (YTVIS) (Yang, Fan, and Xu 2019), available in three versions (2019, 2021 (Yang et al. 2021) and 2022 (Yang, Fan, and Xu 2022)).
Dataset Splits Yes YTVIS-19 consists of 2,238 high-resolution video clips for training and 302 for validation... The improved and extended successors, YTVIS-21 and YTVIS-22, share a training set that comprises 2,985 videos. Moreover, additional videos are included in YTVIS-21 for validation, nearly doubling the annotation quantity compared to its 2019 predecessor.
Hardware Specification Yes The model is implemented with Py Torch-2.0 and trained across 4 NVIDIA 3090 GPUs.
Software Dependencies Yes The model is implemented with Py Torch-2.0
Experiment Setup Yes We train our model with five input frames (T = 5) sampled from the same video, and set the input patch size of LR frames as 64 64. For optimization, we use Adam W with β1 = 0.9, β2 = 0.99 and weight decay = 10 4. The learning rate is initialized to 2 10 4. The Charbonnier loss is applied on whole frames between the ground-truth IHR and restored frame ISR