On the Importance of Firth Bias Reduction in Few-Shot Classification

Authors: Saba Ghaffari, Ehsan Saleh, David Forsyth, Yu-Xiong Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the O(N 1) first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Furthermore, we demonstrate the effectiveness of Firth bias reduction on cross-domain and imbalanced data settings.
Researcher Affiliation Academia Saba Ghaffari Ehsan Saleh David A. Forsyth Yu-Xiong Wang Department of Computer Science, University of Illinois Urbana-Champaign {sabag2, ehsans2, daf, yxw}@illinois.edu
Pseudocode No The paper provides mathematical derivations and descriptions of the method but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction.
Open Datasets Yes Datasets: We perform experiments on four widely-used and publicly available benchmarks: mini Image Net (Vinyals et al., 2016), CIFAR-FS (Bertinetto et al., 2019), tiered-Image Net (Ren et al., 2018), and CUB (Wah et al., 2011).
Dataset Splits Yes Each dataset consists of non-overlapping base, validation, and novel classes. Following the standard practice (Chen et al., 2019), we train feature backbones on base classes, cross-validate bias reduction coefficients on validation classes, and train classifiers and measure test accuracy over multiple trials on novel classes. ... We split the validation and novel classes into 90% training and 10% held-out for accuracy evaluation.
Hardware Specification Yes Overall, this work consumed more than 32 CPU years and one Nvidia V-100 GPU year from the NSF-funded resource allocations in the course of its analysis. Also, this work made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program (ICCP) in conjunction with the National Center for Supercomputing Applications (NCSA) and is supported by funds from the University of Illinois Urbana-Champaign.
Software Dependencies No The paper mentions using a 'simple pipeline in the Pytorch library example' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Table A3: Experimental settings used for the standard backbone experiments. The table is partitioned into 5 sections, where the first section shows the global hyper-parameters used in all standard backbone experiments. ... Learning Rate 0.005, Mini-batch Size 10, Number of Classes 16, Optimizer SGD, Train-Heldout Splits 90%-10%, Number of Epochs 400.