StarNet: towards Weakly Supervised Few-Shot Object Detection

Authors: Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alex Bronstein, Raja Giryes1743-1753

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we introduce Star Net a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, Star Net does not require any bounding box annotations, neither during pre-training, nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, Star Net shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key).
Researcher Affiliation Collaboration Leonid Karlinsky*1, Joseph Shtok*1, Amit Alfassy*1,3, Moshe Lichtenstein*1, Sivan Harary1, Eli Schwartz1,2, Sivan Doveh1, Prasanna Sattigeri1, Rogerio Feris1, Alexander Bronstein3, Raja Giryes2 1 IBM Research AI 2 Tel-Aviv University 3 Technion leonidka@il.ibm.com
Pseudocode Yes Figure 2 and Algorithm 1 provide an overview of our approach. Algorithm 1: Star Net training
Open Source Code Yes 1Our code is avaialble at: https://github.com/leokarlin/Star Net
Open Datasets Yes The CUB fine-grained dataset (Wah et al. 2011) consists of 11, 788 images of birds of 200 species. The Image Net LOC-FS dataset (Karlinsky et al. 2019) contains 331 animal categories from Image Net LOC (Russakovsky et al. 2015). We used Image Net LOC-FS and CUB few-shot datasets, as well as PASCAL VOC (Everingham et al. 2010) experiment from (Wang et al. 2020)
Dataset Splits Yes For each dataset we used the standard train / validation / test splits, which are completely disjoint in terms of contained classes. Only episodes generated from the training split were used for meta-training; the hyperparameters and the best model were chosen using the validation split; and test split was used for measuring performance. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing.
Hardware Specification Yes On a single NVidia K40 GPU, our running times are: 1.15s/batch in 1-stage Star Net training; 2.2 s/batch in 2-stage Star Net training (in same settings (Lee et al. 2019) trains in 2.1s/batch); and 0.01s per query in inference. GPU peak memory was 30MB per image.
Software Dependencies Yes Our implementation is in Py Torch 1.1.0 (Paszke et al. 2017), and is based on the public code of (Lee et al. 2019).
Experiment Setup Yes We use four 1-shot, 5-way episodes per training batch, each episodes with 20 queries. The hyper-parameters σf = 0.2, σg = 2, and η = 0.5 were determined using validation. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing. We train for 60 epochs, changing our base LR = 1 to 0.06, 0.012, 0.0024 at epochs 20, 40, 50 respectively.