Accelerating Convergence in Bayesian Few-Shot Classification

Authors: Tianjun Ke, Haoqun Cao, Feng Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate competitive classification accuracy, improved uncertainty quantification, and faster convergence compared to baseline models.
Researcher Affiliation Academia 1Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China 2Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing.
Pseudocode Yes Algorithm 1: Mirror Descent based Bayesian Few-Shot Classification Training: Input: Input feature and class labels for S tasks: {Xs}S s=1, {ys}S s=1 Output: GP kernel hyperparameter η Initialize GP kernel hyperparameter η and variational parameters eθ0 = 0 and θ1 = η; for Iteration do for Task s do # Update task-specific parameters for Step t do Update eθs t by Equation (4) and Section 3.3; Update θs t+1 = eθs t + η; end # Update task-common parameters Update η by Equation (5). end end Test: Input: Support set S = {X, y}; query set Q = X ; learned hyperparameter ˆη Output: Predicted labels Initialize variational parameters eθ0 = 0 and θ1 = ˆη; # Update task-specific parameters for Step t do Update eθt by Equation (4) and Section 3.3; Update θt+1 = eθt + ˆη; end # Predict labels for x X do Predict y by Equation (6). end
Open Source Code Yes Code is publicly available at https: //github.com/keanson/MD-BSFC.
Open Datasets Yes We address three challenging tasks using benchmark datasets, including Caltech-UCSD Birds (Wah et al., 2011), mini-Image Net (Ravi & Larochelle, 2017), Omniglot (Lake et al., 2011), and EMNIST (Cohen et al., 2017).
Dataset Splits Yes The standard split of 100 training, 50 validation, and 50 test classes is employed (Snell & Zemel, 2021). ... We employed the common split of 64 training, 16 validation, and 20 test classes as well (Snell & Zemel, 2021). ... In the domain transfer task, we utilize 31 for validation and the other for test.
Hardware Specification Yes We use one Quadro RTX 6000 to run each method.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes The Adam optimizer with a standard learning rate of 10-3 for the neural network and a learning rate of 10-4 for other kernel parameters is employed across all our experiments in the outer loop. For a single epoch, 100 random episodes are sampled from the complete dataset for all methods. As for the steps used for variational inference, we run 3 steps with ρ = 1 during training time and 50 steps during testing time with ρ = 0.5.