Neural Outlier Rejection for Self-Supervised Keypoint Learning

Authors: Jiexiong Tang, Hanme Kim, Vitor Guizilini, Sudeep Pillai, Rares Ambrus

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments and ablative analysis, we show that the proposed self-supervised keypoint learning method greatly improves the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art.
Researcher Affiliation Collaboration Jiexiong Tang1,2 Hanme Kim1 Vitor Guizilini1 Sudeep Pillai1 Rares, Ambrus, 1 1 Toyota Research Institute (TRI) 2 KTH Royal Institute of Technology 1{first.last}@tri.global 2jiexiong@kth.se
Pseudocode No The paper provides architectural diagrams (Figure 2) and detailed network tables (Tables 6 and 7) but does not include pseudocode or algorithm blocks.
Open Source Code Yes Code: https://github.com/TRI-ML/KP2D
Open Datasets Yes We train our method using the COCO dataset (Lin et al., 2014), specifically the 2017 version which contains 118k training images. (...) We evaluate our method on image sequences from the HPatches dataset (Balntas et al., 2017), which contains 57 illumination and 59 viewpoint sequences.
Dataset Splits No The paper states it uses 118k training images from COCO 2017 for self-supervised training but does not explicitly specify how the data was split into training, validation, and test sets for its own training process, nor does it provide percentages or counts for such splits from COCO.
Hardware Specification Yes To quantify our runtime performance, we evaluated our model on a desktop with an Nvidia Titan Xp GPU on images of 240x320 resolution.
Software Dependencies No The paper mentions "Py Torch (Paszke et al., 2017)" and "ADAM (Kingma & Ba, 2014)" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We set the learning rate to 10 3 and train for 50 epochs with a batch size of 8, halving the learning rate once after 40 epochs of training. The weights of both networks are randomly initialized. We set the weights for the total training loss as defined Equation (6) to α = 1, β = 2, and λ = 1. These weights are selected to balance the scales of different terms. We set σ1 = 2 in order to avoid border effects while maintaining distributed keypoints over image, as described in Section 3.1. The triplet loss margin m is set to 0.2. The relaxation criteria c for negative sample mining is set to 8. When training the outlier rejection network described in Section 3.2, we set K = 300, i.e. we choose the lowest 300 scoring pairs to train on. We perform the same types of homography adaptation operations as De Tone et al. (2018b): crop, translation, scale, rotation, and symmetric perspective transform. After cropping the image with 0.7 (relative to the original image resolution), the amplitudes for other transforms are sampled uniformly from a pre-defined range: scale [0.8, 1.2], rotation [0, π 4 ] and perspective [0, 0.2]. Following Christiansen et al. (2019), we then apply non-spatial augmentation separately on the source and target frames to allow the network to learn illumination invariance. We add random per-pixel Gaussian noise with magnitude 0.02 (for image intensity normalized to [0, 1]) and Gaussian blur with kernel sizes[1, 3, 5] together with color augmentation in brightness [0.5, 1.5], contrast [0.5, 1.5], saturation [0.8, 1.2] and hue [ 0.2, 0.2]. In addition, we randomly shuffle the color channels and convert color image to gray with probability 0.5.