StarNet: towards Weakly Supervised Few-Shot Object Detection
Authors: Leonid Karlinsky, Joseph Shtok, Amit Alfassy, Moshe Lichtenstein, Sivan Harary, Eli Schwartz, Sivan Doveh, Prasanna Sattigeri, Rogerio Feris, Alex Bronstein, Raja Giryes1743-1753
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we introduce Star Net a few-shot model featuring an end-to-end differentiable non-parametric star-model detection and classification head. Through this head, the backbone is meta-trained using only image-level labels to produce good features for jointly localizing and classifying previously unseen categories of few-shot test tasks using a star-model that geometrically matches between the query and support images (to find corresponding object instances). Being a few-shot detector, Star Net does not require any bounding box annotations, neither during pre-training, nor for novel classes adaptation. It can thus be applied to the previously unexplored and challenging task of Weakly Supervised Few-Shot Object Detection (WS-FSOD), where it attains significant improvements over the baselines. In addition, Star Net shows significant gains on few-shot classification benchmarks that are less cropped around the objects (where object localization is key). |
| Researcher Affiliation | Collaboration | Leonid Karlinsky*1, Joseph Shtok*1, Amit Alfassy*1,3, Moshe Lichtenstein*1, Sivan Harary1, Eli Schwartz1,2, Sivan Doveh1, Prasanna Sattigeri1, Rogerio Feris1, Alexander Bronstein3, Raja Giryes2 1 IBM Research AI 2 Tel-Aviv University 3 Technion leonidka@il.ibm.com |
| Pseudocode | Yes | Figure 2 and Algorithm 1 provide an overview of our approach. Algorithm 1: Star Net training |
| Open Source Code | Yes | 1Our code is avaialble at: https://github.com/leokarlin/Star Net |
| Open Datasets | Yes | The CUB fine-grained dataset (Wah et al. 2011) consists of 11, 788 images of birds of 200 species. The Image Net LOC-FS dataset (Karlinsky et al. 2019) contains 331 animal categories from Image Net LOC (Russakovsky et al. 2015). We used Image Net LOC-FS and CUB few-shot datasets, as well as PASCAL VOC (Everingham et al. 2010) experiment from (Wang et al. 2020) |
| Dataset Splits | Yes | For each dataset we used the standard train / validation / test splits, which are completely disjoint in terms of contained classes. Only episodes generated from the training split were used for meta-training; the hyperparameters and the best model were chosen using the validation split; and test split was used for measuring performance. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing. |
| Hardware Specification | Yes | On a single NVidia K40 GPU, our running times are: 1.15s/batch in 1-stage Star Net training; 2.2 s/batch in 2-stage Star Net training (in same settings (Lee et al. 2019) trains in 2.1s/batch); and 0.01s per query in inference. GPU peak memory was 30MB per image. |
| Software Dependencies | Yes | Our implementation is in Py Torch 1.1.0 (Paszke et al. 2017), and is based on the public code of (Lee et al. 2019). |
| Experiment Setup | Yes | We use four 1-shot, 5-way episodes per training batch, each episodes with 20 queries. The hyper-parameters σf = 0.2, σg = 2, and η = 0.5 were determined using validation. As in (Lee et al. 2019), we use 1000 batches per training epoch, 2000 episodes for validation, and 1000 episodes for testing. We train for 60 epochs, changing our base LR = 1 to 0.06, 0.012, 0.0024 at epochs 20, 40, 50 respectively. |