FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning
Authors: Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, Fo Pro is trained on web datasets with a few realworld examples guided and evaluated on real-world datasets. Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets. Experiments We train Fo Pro on web datasets and evaluate it on real-world testing sets. Fo Pro boosts K-shot performance and reaches the SOTA. Ablation study validates the relation module. |
| Researcher Affiliation | Collaboration | Yulei Qin*1, Xingyu Chen*1, Chao Chen1, Yunhang Shen1, Bo Ren1, Yun Gu2, Jie Yang2, Chunhua Shen3 1Tencent You Tu Lab 2Shanghai Jiao Tong University 3Zhejiang University |
| Pseudocode | No | The paper describes the model architecture and training strategy in detail with equations and a diagram (Figure 3), but it does not provide a formal pseudocode block or algorithm section. |
| Open Source Code | Yes | Code is available at https://github.com/yuleiqin/fopro. |
| Open Datasets | Yes | Web FG496 (Sun et al. 2021) contains three fine-grained datasets sourced from Bing. The testing sets of CUB200-2011 (Wah et al. 2011), FGVC-Aircraft (Maji et al. 2013), and Stanford Car (Krause et al. 2013) are used. Web Vision1k (Li et al. 2017) is collected from Google and Flickr. The validation set of Image Net1k (Deng et al. 2009) is used. Besides, we also use Google500 (Yang et al. 2020) where 500 out of 1k categories are randomly sampled with images only from Google (see Table 1). |
| Dataset Splits | Yes | The validation set of Image Net1k (Deng et al. 2009) is used. We randomly sample K shots per class from the training sets of real-world datasets. |
| Hardware Specification | Yes | Experiments are conducted on a Cent OS 7 workstation with an Intel 8255C CPU, 377 GB Mem, and 8 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam' and 'SGD' optimizers, and refers to 'PyTorch' for R50 training, but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Web FG496 The B-CNN (Lin, Roy Chowdhury, and Maji 2015) (VGG-16 (Simonyan and Zisserman 2014)) is used as encoder. We refer to (Sun et al. 2021) for the training settings: optimizer is Adam with weight decay of 1 10 8; batch size is 64; the learning rate is 1 10 4 and decays to 0 by cosine schedule; a warm-up policy increases the learning rate linearly for 5 epochs with the frozen encoders. Web Vision1k The Res Net-50 (R50) (He et al. 2016) is used as encoder. We refer to (Yang et al. 2020) for the training settings: batch size is 256; optimizer is SGD with the momentum of 0.9 and weight decay of 1 10 4; the learning rate is 0.01 and decays to 0 by cosine schedule. We refer to Mo Pro to set me = 0.999, mp = 0.999, dp = 128, and Q = 8192. In view of the dataset scale, we set T1 = 20, T2 = 5, T3 = 20, T4 = 175 for Web FG496 and set T1 = 15, T2 = 5, T3 = 10, T4 = 30 for Web Vision1k/Google500. Preliminary experiments on Web FG496 show that γ = 0.6 and β = 0.5 work better than γ = 0.2 and β = 0, 0.25, 0.75, 1. Other hyper-parameters are empirically set as: δw = 0, δt = 0.5, τ = 0.1, α = 10, σ = 20. Data augmentation includes random cropping and horizontal flipping. Strong augmentation on the inputs to the momentum encoder (He et al. 2020) additionally uses color jittering and blurring. Since birds might only differ in color, random rotation in 45 degrees is used instead. |