Revisit Finetuning strategy for Few-Shot Learning to Transfer the Emdeddings

Authors: Heng Wang, Tan Yue, Xiang Ye, Zihang He, Bohan Li, Yong Li

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show the effectiveness of the designed LP-FT-FB, we conducted comprehensive experiments on the commonly used FSL datasets under different backbones for in-domain and cross-domain FSL tasks. The experimental results show that the proposed FT-LP-FB outperforms the SOTA FSL methods.
Researcher Affiliation Academia Anonymous authors Paper under double-blind review. No affiliations provided, therefore classification is not possible.
Pseudocode No The provided text does not contain any explicit pseudocode or algorithm blocks. It mentions that 'The whole flow of LP-FT-FB is given in Appendix.', but the Appendix content is not included here.
Open Source Code Yes The code is available at https://github.com/whzyf951620/ Linear Probing Finetuning Firth Bias.
Open Datasets Yes The experiments are evaluated on three typical FSL datasets, mini-Imagenet Vinyals et al. (2016), tiered-Imagenet Ren et al. (2018), and CUB Wah et al. (2011).
Dataset Splits Yes mini-Imagenet consists of 100 classes from the Image Net, which are split randomly into 64 base, 16 validation, and 20 novel classes. tiered-Imagenet consists of 608 classes from the Image Net, which are split randomly into 351 base, 97 validation, and 160 novel classes. CUB contains 200 classes with a total of 11,788 images of size 84 84. The base, validation, and novel split contain 100, 50, and 50 classes.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers like SGD and references other models, but it does not specify any software dependencies with version numbers (e.g., specific Python, PyTorch, or TensorFlow versions).
Experiment Setup Yes For LP, we used the linear classifier proposed in the Baseline++ Chen et al. (2019). For the optimizer, the SGD is used with the learning rate α1 = 0.01, momentum 0.9, dampening 0.9, and weight decay 1e 3. For the FBR of classifier, the factor λ in Eq. 2 is set to 1. For FT, the feature extractor and the classifier are together manually finetuned with the learning rate α2 = 1e 3 and the i-FBR factor λinv in Eq. 7 is set to 1e 3.