Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning

Authors: Konstantinos Kalais, Sotirios Chatzis

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed model in various few-shot learning tasks: image classification, sinusoidal regression and active learning.
Researcher Affiliation Academia 1Dept. of Electrical Eng., Computer Eng., and Informatics, Cyprus University of Technology, Limassol, Cyprus. Correspondence to: Konstantinos Kalais <ki.kalais@cut.ac.cy>, Sotirios Chatzis <sotirios.chatzis@cut.ac.cy>.
Pseudocode Yes Algorithm 1 Model training with Stoch LWTA-ML
Open Source Code Yes Code is available at: https://github.com/ Kkalais/Stoch LWTA-ML
Open Datasets Yes We first evaluate Stoch LWTA-ML on popular few-shot image classification datasets, and compare its performance to state-of-the-art prior results. In Table 1, we show how Stoch LWTA-ML performs on Omniglot 20-way (Lake et al., 2017), Mini-Imagenet 5-way (Vinyals et al., 2016) and CIFAR-100 5-way (Krizhevsky, 2009) few-shot settings.
Dataset Splits No Appendix A describes the training and testing splits for Omniglot, Mini-Imagenet, and CIFAR-100 (e.g., 'The ratio between training and testings sets is 3:2...'), but no explicit separate validation dataset split is mentioned.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. Only the software framework 'Tensorflow' is mentioned.
Software Dependencies No The paper states 'The code was implemented in Tensorflow (Abadi et al., 2016).' but does not provide specific version numbers for Tensorflow or any other software dependencies.
Experiment Setup Yes After thorough exploration on the number of LWTA layers as well as the number of blocks for each layer and the competing units per block, we end up with using networks comprising 2 layers with 16 blocks and 2 competing units per block on the former layer, and 8 blocks with 2 units per block on the latter. The last network layer is a Softmax. Weight mean initialization, as well as point-estimate initialization for our competitors, is performed via Glorot Uniform. Weight log-variance initialization is performed via Glorot Normal, by sampling from N(0.0005, 0.01). The Gumbel-Softmax relaxation temperature is set to τ = 0.67. In the inner-loop updates, we use the Stochastic Gradient Descent (SGD) (Robbins, 2007) optimizer with a learning rate of 0.003. For the outer-loop, we use SGD with a linear annealed outer step size to 0, and an initial value of 0.25. Additionally, all the experiments were ran with task batch size of 50 for both training and testing mode. Prediction is carried out averaging over B = 4 output logits.