Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning
Authors: Konstantinos Kalais, Sotirios Chatzis
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed model in various few-shot learning tasks: image classification, sinusoidal regression and active learning. |
| Researcher Affiliation | Academia | 1Dept. of Electrical Eng., Computer Eng., and Informatics, Cyprus University of Technology, Limassol, Cyprus. Correspondence to: Konstantinos Kalais <ki.kalais@cut.ac.cy>, Sotirios Chatzis <sotirios.chatzis@cut.ac.cy>. |
| Pseudocode | Yes | Algorithm 1 Model training with Stoch LWTA-ML |
| Open Source Code | Yes | Code is available at: https://github.com/ Kkalais/Stoch LWTA-ML |
| Open Datasets | Yes | We first evaluate Stoch LWTA-ML on popular few-shot image classification datasets, and compare its performance to state-of-the-art prior results. In Table 1, we show how Stoch LWTA-ML performs on Omniglot 20-way (Lake et al., 2017), Mini-Imagenet 5-way (Vinyals et al., 2016) and CIFAR-100 5-way (Krizhevsky, 2009) few-shot settings. |
| Dataset Splits | No | Appendix A describes the training and testing splits for Omniglot, Mini-Imagenet, and CIFAR-100 (e.g., 'The ratio between training and testings sets is 3:2...'), but no explicit separate validation dataset split is mentioned. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. Only the software framework 'Tensorflow' is mentioned. |
| Software Dependencies | No | The paper states 'The code was implemented in Tensorflow (Abadi et al., 2016).' but does not provide specific version numbers for Tensorflow or any other software dependencies. |
| Experiment Setup | Yes | After thorough exploration on the number of LWTA layers as well as the number of blocks for each layer and the competing units per block, we end up with using networks comprising 2 layers with 16 blocks and 2 competing units per block on the former layer, and 8 blocks with 2 units per block on the latter. The last network layer is a Softmax. Weight mean initialization, as well as point-estimate initialization for our competitors, is performed via Glorot Uniform. Weight log-variance initialization is performed via Glorot Normal, by sampling from N(0.0005, 0.01). The Gumbel-Softmax relaxation temperature is set to τ = 0.67. In the inner-loop updates, we use the Stochastic Gradient Descent (SGD) (Robbins, 2007) optimizer with a learning rate of 0.003. For the outer-loop, we use SGD with a linear annealed outer step size to 0, and an initial value of 0.25. Additionally, all the experiments were ran with task batch size of 50 for both training and testing mode. Prediction is carried out averaging over B = 4 output logits. |