Unsupervised Cross-Spectral Stereo Matching by Learning to Synthesize

Authors: Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song8706-8713

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method achieves good performance without using depth supervision or explicit semantic information. We evaluate on the Pitts Stereo RGBNIR dataset proposed by (Zhi et al. 2018) which covers many material categories including lights, glass, glossy surfaces, vegetation, skin, clothing and bags. This dataset was captured by a visible (VIS) and near infrared (NIR) camera pairs. We define the left VIS as spectrum A and right NIR as spectrum B. The Left VIS consists of three spectral bands while the right NIR consists of only one band. For the simplicity of implementation, we convert NIR images into three channels. Table 1 presents the comparison with disparity RMSE and execution time.
Researcher Affiliation Collaboration Mingyang Liang,1,2 Xiaoyang Guo,3 Hongsheng Li,3 Xiaogang Wang,3 You Song1 1Beihang University, Beijing, China 2Sense Time Research 3The Chinese University of Hong Kong, Hong Kong, China
Pseudocode No The paper describes the iterative optimization steps and network architecture but does not include formal pseudocode or an algorithm block.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate on the Pitts Stereo RGBNIR dataset proposed by (Zhi et al. 2018) which covers many material categories including lights, glass, glossy surfaces, vegetation, skin, clothing and bags.
Dataset Splits No The dataset is split into two sets for training (40000 pairs) and testing (2000 pairs), which is the same as (Zhi et al. 2018). While training and testing splits are mentioned, no specific validation set size or percentage is provided.
Hardware Specification Yes The proposed methods are tested on a single NVIDIA TITAN Xp GPU, which is the same as (Zhi et al. 2018). The training process takes about 34 hours using 8 Nvidia TITAN Xp GPUs.
Software Dependencies No The paper mentions several frameworks and optimizers like Dispnet, Adam optimizer, Cycle GAN, and Kaiming initialization, but it does not specify any software libraries with version numbers (e.g., Python 3.x, PyTorch 1.x) which would be necessary for full reproducibility.
Experiment Setup Yes The SMN predicts the disparity directly instead of the ratio between disparity and image width. A scaling factor η = 0.008 is multiplied to the predictions for stable optimization. The weights of the losses in STN are set to λc = 10, λr = 5, λa = 1, λd = 1, and the weights of losses in SMN are αap = 1, αds = 0.2, αlr = 0.1, αaux = 20. We use 5x5 window for calculating the structural similarity δ, and the α in Equ. 11 is set to 0.9. The STN and SMN are trained on 40000 cross-spectral image pairs with Adam optimizer (Kingma and Ba 2014) (batch size = 16 and learning rate = 0.0002). For data augmentation, we flip the input images of STN horizontally with a 50% chance. Input images are resized into 512x384 for the entire network. We perform an instance normalization on the images provided to the SMN as input.