FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

Authors: Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, Fo Pro is trained on web datasets with a few realworld examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Experiments We train Fo Pro on web datasets and evaluate it on real-world testing sets. Fo Pro boosts K-shot performance and reaches the SOTA. Ablation study validates the relation module.
Researcher Affiliation Collaboration Yulei Qin*1, Xingyu Chen*1, Chao Chen1, Yunhang Shen1, Bo Ren1, Yun Gu2, Jie Yang2, Chunhua Shen3 1Tencent You Tu Lab 2Shanghai Jiao Tong University 3Zhejiang University
Pseudocode No The paper describes the model architecture and training strategy in detail with equations and a diagram (Figure 3), but it does not provide a formal pseudocode block or algorithm section.
Open Source Code Yes Code is available at https://github.com/yuleiqin/fopro.
Open Datasets Yes Web FG496 (Sun et al. 2021) contains three fine-grained datasets sourced from Bing. The testing sets of CUB200-2011 (Wah et al. 2011), FGVC-Aircraft (Maji et al. 2013), and Stanford Car (Krause et al. 2013) are used. Web Vision1k (Li et al. 2017) is collected from Google and Flickr. The validation set of Image Net1k (Deng et al. 2009) is used. Besides, we also use Google500 (Yang et al. 2020) where 500 out of 1k categories are randomly sampled with images only from Google (see Table 1).
Dataset Splits Yes The validation set of Image Net1k (Deng et al. 2009) is used. We randomly sample K shots per class from the training sets of real-world datasets.
Hardware Specification Yes Experiments are conducted on a Cent OS 7 workstation with an Intel 8255C CPU, 377 GB Mem, and 8 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions software components like 'Adam' and 'SGD' optimizers, and refers to 'PyTorch' for R50 training, but it does not specify version numbers for any software dependencies or libraries.
Experiment Setup Yes Web FG496 The B-CNN (Lin, Roy Chowdhury, and Maji 2015) (VGG-16 (Simonyan and Zisserman 2014)) is used as encoder. We refer to (Sun et al. 2021) for the training settings: optimizer is Adam with weight decay of 1 10 8; batch size is 64; the learning rate is 1 10 4 and decays to 0 by cosine schedule; a warm-up policy increases the learning rate linearly for 5 epochs with the frozen encoders. Web Vision1k The Res Net-50 (R50) (He et al. 2016) is used as encoder. We refer to (Yang et al. 2020) for the training settings: batch size is 256; optimizer is SGD with the momentum of 0.9 and weight decay of 1 10 4; the learning rate is 0.01 and decays to 0 by cosine schedule. We refer to Mo Pro to set me = 0.999, mp = 0.999, dp = 128, and Q = 8192. In view of the dataset scale, we set T1 = 20, T2 = 5, T3 = 20, T4 = 175 for Web FG496 and set T1 = 15, T2 = 5, T3 = 10, T4 = 30 for Web Vision1k/Google500. Preliminary experiments on Web FG496 show that γ = 0.6 and β = 0.5 work better than γ = 0.2 and β = 0, 0.25, 0.75, 1. Other hyper-parameters are empirically set as: δw = 0, δt = 0.5, τ = 0.1, α = 10, σ = 20. Data augmentation includes random cropping and horizontal flipping. Strong augmentation on the inputs to the momentum encoder (He et al. 2020) additionally uses color jittering and blurring. Since birds might only differ in color, random rotation in 45 degrees is used instead.