Intrinsic Phase-Preserving Networks for Depth Super Resolution

Authors: Xuanhong Chen, Hang Wang, Jialiang Chen, Kairui Feng, Jinfan Liu, Xiaohang Wang, Weimin Zhang, Bingbing Ni

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various benchmark datasets, e.g., NYU v2, RGB-D-D, reach SOTA performance and also well demonstrate the validity of the proposed phase-preserving scheme.
Researcher Affiliation Collaboration Xuanhong Chen1,2*, Hang Wang3 , Jialiang Chen1,2, Kairui Feng4, Jinfan Liu1, Xiaohang Wang1, Weimin Zhang1,2, Bingbing Ni1,2 1Shanghai Jiao Tong University, Shanghai 200240, China 2USC-SJTU Institute of Cultural and Creative Industry 3Huawei 4National Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University
Pseudocode No The paper does not contain a dedicated section or figure explicitly labeled 'Pseudocode' or 'Algorithm' with structured steps.
Open Source Code Yes Code: https://github. com/neuralchen/IPPNet/. The source code will be released for reproducibility.
Open Datasets Yes NYU v2 (Silberman et al. 2012) contains 1449 RGB/D pairs collected by Kinect, of which 1000 RGB/D pairs are used for training, and 449 RGB/D pairs are used for testing. Our model is also tested on Middlebury (Scharstein and Pal 2007; Hirschm uller and Scharstein 2007) and Lu (Lu, Ren, and Liu 2014).
Dataset Splits No NYU v2 (Silberman et al. 2012) contains 1449 RGB/D pairs collected by Kinect, of which 1000 RGB/D pairs are used for training, and 449 RGB/D pairs are used for testing.
Hardware Specification Yes Our method is implemented with Py Torch (Paszke et al. 2017), and one NVIDIA Tesla V100 GPU is used for training. The running time is tested at size of 640 480 on one NVIDIA TITAN XP GPU over 10 independent runs.
Software Dependencies No Our method is implemented with Py Torch (Paszke et al. 2017), and one NVIDIA Tesla V100 GPU is used for training.
Experiment Setup Yes Following JIIF (Tang, Chen, and Zeng 2021), HR image is randomly cropped into (256, 256) patches during training. LR input depth map is generated from HR ground truth using bicubic downsampling at different ratios (4 , 8 , 16 ). Depth encoder, RGB encoder and refinement module consist of 4 Resblocks (He et al. 2016) without batch normalization layer. The number N of phase-preserving filtering blocks is set to 3. Adam(Kingma and Ba 2015) optimizer is used to train our model. The learning rate is initially set to 1 10 4 and then halved every 8K iterations.