Matching neural paths: transfer from recognition to correspondence search

Authors: Nikolay Savinov, Lubor Ladicky, Marc Pollefeys

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical validation is done on the task of stereo correspondence and demonstrates that we achieve competitive results among the methods which do not use labeled target domain data.
Researcher Affiliation Collaboration Nikolay Savinov1 Lubor Ladicky1 Marc Pollefeys1,2 1Department of Computer Science at ETH Zurich, 2Microsoft {nikolay.savinov,lubor.ladicky,marc.pollefeys}@inf.ethz.ch
Pseudocode Yes Algorithm 1 Backward pass
Open Source Code No The paper cites a GitHub repository for a baseline method ([24] Jure Zbontar and Yann Le Cun. MC-CNN github repository. https://github.com/jzbontar/mc-cnn, 2016.) but does not provide a link or statement about the availability of their own source code for the proposed method.
Open Datasets Yes Empirical validation is done on the task of stereo correspondence on two datasets: KITTI 2012 [6] and KITTI 2015 [14].
Dataset Splits Yes The dataset consists of 194 training image pairs and 195 test image pairs. The reflective surfaces like windshields were excluded from the ground truth. ... For each training pair, the ground-truth shift is measured densely per-pixel. ... we measure the quality on the same 40 validation images.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU, CPU models) used for running the experiments. It only mentions 'our own cluster was down' in the acknowledgements.
Software Dependencies No The paper mentions using the VGG-16 network and refers to a baseline's post-processing code, but it does not specify software dependencies with version numbers for its own implementation.
Experiment Setup Yes For this task, the input data dimensionality is B = 2 and the shift set is represented by horizontal shifts D = {(0, 0, 0), . . . , (Dmax, 0, 0)}. We always convert images to grayscale before running CNNs, following the observation by [25] that color does not help. For pre-trained recognition CNN, we chose the VGG-16 network [20]. ... In particular, we usually started from layer 2 and finished at layer 8. As such, it is still necessary to consider multi-channel input. ... We will thus abbreviate our methods as ours(s, t) where s is the starting layer and t is the last layer. ... First, we obtain the raw scores U(x, d) from Algorithm 1 for the shifts up to Dmax = 228. Then we normalize the scores U(x, ) per-pixel by dividing them over the maximal score, thus turning them into the range [0, 1], suitable for running the post-processing code [24]. Finally, we run the post-processing code with exactly the same parameters as the original method [25]...