Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-View Dynamic Reflection Prior for Video Glass Surface Detection

Authors: Fang Liu, Yuhao Liu, Jiaying Lin, Ke Xu, Rynson W.H. Lau

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that VGSD-Net outperforms state-of-the-art approaches adapted from related fields.
Researcher Affiliation Academia Department of Computer Science, City University of Hong Kong EMAIL, EMAIL
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured algorithm blocks in a code-like format.
Open Source Code No Code and dataset will be available at https://github.com/fawnliu/VGSD.
Open Datasets No We have also created the first large-scale video glass surface dataset (VGSD-D), consisting of 19,166 image frames with accurately-annotated glass masks extracted from 297 videos. Code and dataset will be available at https://github.com/fawnliu/VGSD.
Dataset Splits No They are randomly divided into a training set (12,315 frames from 192 videos) and a testing set (6,851 frames from 105 videos).
Hardware Specification Yes We build our method using Py Torch toolbox and conduct all experiments on a Tesla V100 GPU with 32 GB memory.
Software Dependencies No The paper mentions building the method 'using Py Torch toolbox' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes We adopt the Adam optimizer with a weight decay of 5 10 4 and a maximum learning rate of 5 10 5. The cosine learning rate scheduler and warm-up are used to adjust the learning rate. The batch size and training epochs are 5 and 15. The input images were randomly flipped horizontally and were resized to 416 416 for network training. We employ Res Next-101 (Xie et al. 2017) pre-trained on Image Net as the encoder. We set the number of masked deformable blocks in LRE to m = 4. The window size and k in MDA are empirically set to 7 7 and 4.