Hierarchical Neural Architecture Search for Deep Stereo Matching

Authors: Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Yuchao Dai, Xiaojun Chang, Hongdong Li, Tom Drummond, Zongyuan Ge

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our searched network outperforms all state-of-the-art deep stereo matching architectures and is ranked at the top 1 accuracy on KITTI stereo 2012, 2015 and Middlebury benchmarks, as well as the top 1 on Scene Flow dataset with a substantial improvement on the size of the network and the speed of inference. In this section, we adopt Scene Flow dataset [3] as the source dataset to analyze our architecture search outcome. We then conduct the architecture evaluation on KITTI 2012 [29], KITTI 2015 [30] and Middlebury 2014 [31] benchmarks by inheriting the searched architecture from Scene Flow dataset. In our ablation study, we analyze the effect of changing search space as well as different search strategies.
Researcher Affiliation Collaboration Xuelian Cheng1,5, *Yiran Zhong2,6, Mehrtash Harandi1,7, Yuchao Dai3, Xiaojun Chang1, Tom Drummond1, Hongdong Li2,6, Zongyuan Ge1,4,5 1Faculty of Engineering, Monash University, 2Australian National University, 3Northwestern Polytechnical University, 4e Research Centre, Monash University, 5Airdoc Research Australia, 6ACRV, 7Data61, CSIRO
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at LEAStereo.
Open Datasets Yes We conduct full architecture search on Scene Flow dataset [3]. It contains 35,454 training and 4370 testing synthetic images with a typical image dimension of 540 960. ... We then conduct the architecture evaluation on KITTI 2012 [29], KITTI 2015 [30] and Middlebury 2014 [31] benchmarks by inheriting the searched architecture from Scene Flow dataset.
Dataset Splits Yes We randomly select 20,000 image pairs from the training set as our searchtraining-set, and another 1,000 image pairs from the training set are used as the search-validation-set following [18].
Hardware Specification Yes The entire architecture search optimization takes about 10 GPU days on an NVIDIA V100 GPU.
Software Dependencies No We implement our LEAStereo network in Pytorch. This only mentions "Pytorch" without a specific version number.
Experiment Setup Yes We search the architecture for a total of 10 epochs: the first three epochs to initiate the weight w of the super-network and avoid bad local minima outcome; the rest epochs to update the architecture parameters α, β. We use SGD optimizer with momentum 0.9, cosine learning rate that decays from 0.025 to 0.001, and weight decay 0.0003.