reproducibilityindex.ai

StarNet: towards Weakly Supervised Few-Shot Object Detection

Authors: Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alex Bronstein, Raja Giryes1743-1753

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce Star Net a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classiﬁcation head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to ﬁnd corresponding object instances). Being a few-shot detector, Star Net does not require any bounding box annotations, neither during pre-training, nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains signiﬁcant improvements over the baselines. In addition, Star Net shows signiﬁcant gains on few-shot classiﬁcation benchmarks that are less cropped around the objects (where object localization is key).
Researcher Affiliation	Collaboration	Leonid Karlinsky1, Joseph Shtok1, Amit Alfassy1,3, Moshe Lichtenstein1, Sivan Harary1, Eli Schwartz1,2, Sivan Doveh1, Prasanna Sattigeri1, Rogerio Feris1, Alexander Bronstein3, Raja Giryes2 1 IBM Research AI 2 Tel-Aviv University 3 Technion leonidka@il.ibm.com
Pseudocode	Yes	Figure 2 and Algorithm 1 provide an overview of our approach. Algorithm 1: Star Net training
Open Source Code	Yes	1Our code is avaialble at: https://github.com/leokarlin/Star Net
Open Datasets	Yes	The CUB ﬁne-grained dataset (Wah et al. 2011) consists of 11, 788 images of birds of 200 species. The Image Net LOC-FS dataset (Karlinsky et al. 2019) contains 331 animal categories from Image Net LOC (Russakovsky et al. 2015). We used Image Net LOC-FS and CUB few-shot datasets, as well as PASCAL VOC (Everingham et al. 2010) experiment from (Wang et al. 2020)
Dataset Splits	Yes	For each dataset we used the standard train / validation / test splits, which are completely disjoint in terms of contained classes. Only episodes generated from the training split were used for meta-training; the hyperparameters and the best model were chosen using the validation split; and test split was used for measuring performance. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing.
Hardware Specification	Yes	On a single NVidia K40 GPU, our running times are: 1.15s/batch in 1-stage Star Net training; 2.2 s/batch in 2-stage Star Net training (in same settings (Lee et al. 2019) trains in 2.1s/batch); and 0.01s per query in inference. GPU peak memory was 30MB per image.
Software Dependencies	Yes	Our implementation is in Py Torch 1.1.0 (Paszke et al. 2017), and is based on the public code of (Lee et al. 2019).
Experiment Setup	Yes	We use four 1-shot, 5-way episodes per training batch, each episodes with 20 queries. The hyper-parameters σf = 0.2, σg = 2, and η = 0.5 were determined using validation. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing. We train for 60 epochs, changing our base LR = 1 to 0.06, 0.012, 0.0024 at epochs 20, 40, 50 respectively.