Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

Authors: Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we demonstrate results of experiments conducted on the PASCAL-5i dataset [Shaban et al., 2017] compared to state of the art methods in section 5.2. We then demonstrate the results for the different variants of our approach depicted in Fig. 3 and experiment with the proposed TOSFL setup in section 5.3.
Researcher Affiliation Collaboration 1 University of Alberta 2 Indian Institute of Science 3 Element AI 4 Hi Silicon, Huawei Research
Pseudocode No The paper includes figures describing the model architecture, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We utilize a Res Net-50 [He et al., 2016] encoder pre-trained on Image Net [Deng et al., 2009] to extract visual features... PASCAL-5i splits PASCAL-VOC 20 classes into 4 folds... Table 2 demonstrates results on MS-COCO [Lin et al., 2014]... Our setup relies on the image-level label for the support image to segment different parts from the query image conditioned on the word embeddings of this image-level label. We utilize Youtube-VOS dataset training data which has 65 classes, and we split them into 5 folds.
Dataset Splits Yes In order to ensure the evaluation for the few-shot method is not biased to a certain category, it is best to split into multiple folds and evaluate on different ones similar to [Shaban et al., 2017]... PASCAL-5i splits PASCAL-VOC 20 classes into 4 folds each having 5 classes... In each fold the model is meta-trained for a maximum number of 50 epochs on the classes outside the test fold on pascal-5i, and 20 epochs on both MS-COCO and Youtube-VOS.
Hardware Specification No The paper describes the training process and parameters (e.g., 'momentum SGD', 'Batch size of 4'), but it does not specify any hardware components like GPU models, CPU types, or memory used for the experiments.
Software Dependencies No The paper mentions models like 'Res Net-50' and optimizers like 'momentum SGD', but it does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, CUDA 11.x).
Experiment Setup Yes We train all models using momentum SGD with learning rate 0.01 that is reduced by 0.1 at epoch 35, 40 and 45 and momentum 0.9. L2 regularization with a factor of 5x10 4 is used to avoid over-fitting. Batch size of 4 and input resolution of 321 321 are used during training with random horizontal flipping and random centered cropping for the support set. An input resolution of 500 500 is used for the meta-testing phase similar to [Shaban et al., 2017].