One-Shot Affordance Detection

Authors: Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, Dacheng Tao

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality. The benchmark suite is at Project Page.
Researcher Affiliation Collaboration Hongchen Luo1 , Wei Zhai1,3 , Jing Zhang2 , Yang Cao1 , Dacheng Tao3 1 University of Science and Technology of China, China 2 The University of Sydney, Australia 3 JD Explore Academy, JD.com, China
Pseudocode No The paper describes its methodology using mathematical equations and text, and illustrates network architectures with diagrams, but it does not include formal pseudocode or algorithm blocks.
Open Source Code No The abstract mentions 'The benchmark suite is at Project Page.' but does not provide a direct link to a source-code repository or explicitly state that the code for the described methodology is publicly available.
Open Datasets Yes We construct the Purpose-driven Affordance Dataset (PAD) with images mainly from ILSVRC [Russakovsky et al., 2015], COCO [Lin et al., 2014], etc.
Dataset Splits Yes To benchmark different models comprehensively, we follow the k-fold evaluation protocol, where k is 3 in this paper. To this end, the dataset is divided into three parts with non-overlapped categories, where any two of them are used for training while the left part is used for testing. See the supplementary material for more details about the setting.
Hardware Specification Yes We train the model for 40 epochs on a single NVIDIA 1080ti GPU with an initial learning rate 1e-4.
Software Dependencies No Our method is implemented in Pytorch and trained with the Adam optimizer [Kingma and Ba, 2014]. The backbone is resnet50 [He et al., 2016]. No version numbers are specified for Pytorch or other software dependencies.
Experiment Setup Yes We train the model for 40 epochs on a single NVIDIA 1080ti GPU with an initial learning rate 1e-4. The number of bases in the collaboration enhancement module is set to K=256. The number of E-M iteration steps is 3. Besides, two segmentation models (UNet [Ronneberger et al., 2015], PSPNet [Zhao et al., 2017]), three saliency detection models (CPD [Wu et al., 2019], BASNet [Qin et al., 2019], CSNet [Gao et al., 2020]) and one co-saliency detection models (Co EGNet [Fan et al., 2021]) are chosen for comparison.