reproducibilityindex.ai

Variational Metric Scaling for Metric-Based Meta-Learning

Authors: Jiaxin Chen, Li-ming Zhan, Xiao-Ming Wu, Fu-lai Chung3478-3485

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on mini Image Net show that our methods can be used to consistently improve the performance of existing metric-based meta-algorithms. To evaluate our methods, we plug them into two popular algorithms, prototypical networks (PN) (Snell, Swersky, and Zemel 2017) and TADAM (Oreshkin, L opez, and Lacoste 2018), implemented by both Conv-4 and Res Net-12 backbone networks. To be elaborated later, Table 1 shows our main results in comparison to state-of-the-art meta-algorithms.
Researcher Affiliation	Academia	Jiaxin Chen, Li-Ming Zhan, Xiao-Ming Wu, Fu-lai Chung Department of Computing The Hong Kong Polytechnic University {jiax.chen, lmzhan.zhan}@connect.polyu.hk, xiao-ming.wu@polyu.edu.hk, cskchung@comp.polyu.edu.hk
Pseudocode	Yes	Algorithm 1: Stochastic Variational Scaling for Prototypical Networks; Algorithm 2: Dimensional Amortized Variational Scaling for Prototypical Networks
Open Source Code	No	The paper does not contain any explicit statements or links indicating that open-source code for the methodology is provided.
Open Datasets	Yes	The mini Image Net (Vinyals et al. 2016) consists of 100 classes with 600 images per class. We follow the data split suggested by Ravi and Larochelle (2017), where the dataset is separated into a training set with 64 classes, a testing set with 20 classes and a validation set with 16 classes.
Dataset Splits	Yes	The mini Image Net (Vinyals et al. 2016) consists of 100 classes with 600 images per class. We follow the data split suggested by Ravi and Larochelle (2017), where the dataset is separated into a training set with 64 classes, a testing set with 20 classes and a validation set with 16 classes.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments. It mentions model architectures like 'Conv-4' and 'Res Net-12' but not the underlying computational resources (e.g., specific GPUs or CPUs).
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'SGD optimizer' but does not specify version numbers for these optimizers or any other software libraries or frameworks used in the implementation.
Experiment Setup	Yes	For Conv-4, we use Adam optimizer with a learning rate of 1e 3 without weight decay. The total number of training episodes is 20, 000 for Conv-4. And for Res Net-12, we use SGD optimizer with momentum 0.9, weight decay 4e 4 and 45, 000 episodes in total. The learning rate is initialized as 0.1 and decayed 90% at episode steps 15000, 30000 and 35000. Besides, we use gradient clipping when training Res Net12. The prior distribution of the metric scaling parameter is set as p(α) = N(1, 1) and the variational parameters are initialized as μinit = 100, σinit = 0.2. The learning rate is set to be lψ = 1e 4. The learning rate for D-SVS is set to be lψ = 16. We use a multi-layer perception (MLP) with one hidden layer as the generator Gβ. The learning rate lβ is set to be 1e 3.