Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation

Authors: Yasi Wang, Hong Liu, Chao Zhang, Lu Xu, Qiang Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted to verify the effectiveness of the proposed method.
Researcher Affiliation Collaboration 1Samsung Research China Beijing (SRC-B), China 2Department of Biomedical Engineering, Eindhoven University of Technology, Netherlands
Pseudocode Yes The pseudo code of this module is depicted in Algorithm 1.
Open Source Code Yes Our code is available at https://github.com/SAITPublic/Mask Homo.
Open Datasets Yes Our method is evaluated on a natural image dataset (Zhang et al. 2020; Liu et al. 2022a) with 75.8k training pairs and 4.2k testing pairs.
Dataset Splits No The paper states '75.8k training pairs and 4.2k testing pairs' but does not specify a validation set or its split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Adam optimizer' but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For training, we randomly crop 384 512 patches near the center of original images to avoid out-of-bound coordinates after warping. Other parameters for H estimation transformer are same to (Hong et al. 2022). Adam optimizer (P. Kingma and Ba 2015) is employed. For H estimation training, the learning rate is 1 10 4, which decays by a factor of 0.8 after every epoch, batch size is 8 and it takes 10 epochs to train. For segmentation post-processing, there are five hyperparameters involved: Shole, Sseg, Dmin, Srto and N. The first four parameters determine the shape of generated segmentation masks. We empirically find that Dmin affects the diversity of generated segmentation masks much more significantly than others. Thus in our experiments, we fix Shole = 500, Sseg = 10, 000, Srto = 15%, while Dmin is varied to investigate the influence of segmentation post-processing on the performance. The last parameter N decides the maximum number of pseudo plane masks within each image pair in inference, which is empirically set to 4.