Variational Metric Scaling for Metric-Based Meta-Learning
Authors: Jiaxin Chen, Li-ming Zhan, Xiao-Ming Wu, Fu-lai Chung3478-3485
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on mini Image Net show that our methods can be used to consistently improve the performance of existing metric-based meta-algorithms. To evaluate our methods, we plug them into two popular algorithms, prototypical networks (PN) (Snell, Swersky, and Zemel 2017) and TADAM (Oreshkin, L opez, and Lacoste 2018), implemented by both Conv-4 and Res Net-12 backbone networks. To be elaborated later, Table 1 shows our main results in comparison to state-of-the-art meta-algorithms. |
| Researcher Affiliation | Academia | Jiaxin Chen, Li-Ming Zhan, Xiao-Ming Wu, Fu-lai Chung Department of Computing The Hong Kong Polytechnic University {jiax.chen, lmzhan.zhan}@connect.polyu.hk, xiao-ming.wu@polyu.edu.hk, cskchung@comp.polyu.edu.hk |
| Pseudocode | Yes | Algorithm 1: Stochastic Variational Scaling for Prototypical Networks; Algorithm 2: Dimensional Amortized Variational Scaling for Prototypical Networks |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that open-source code for the methodology is provided. |
| Open Datasets | Yes | The mini Image Net (Vinyals et al. 2016) consists of 100 classes with 600 images per class. We follow the data split suggested by Ravi and Larochelle (2017), where the dataset is separated into a training set with 64 classes, a testing set with 20 classes and a validation set with 16 classes. |
| Dataset Splits | Yes | The mini Image Net (Vinyals et al. 2016) consists of 100 classes with 600 images per class. We follow the data split suggested by Ravi and Larochelle (2017), where the dataset is separated into a training set with 64 classes, a testing set with 20 classes and a validation set with 16 classes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. It mentions model architectures like 'Conv-4' and 'Res Net-12' but not the underlying computational resources (e.g., specific GPUs or CPUs). |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'SGD optimizer' but does not specify version numbers for these optimizers or any other software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | For Conv-4, we use Adam optimizer with a learning rate of 1e 3 without weight decay. The total number of training episodes is 20, 000 for Conv-4. And for Res Net-12, we use SGD optimizer with momentum 0.9, weight decay 4e 4 and 45, 000 episodes in total. The learning rate is initialized as 0.1 and decayed 90% at episode steps 15000, 30000 and 35000. Besides, we use gradient clipping when training Res Net12. The prior distribution of the metric scaling parameter is set as p(α) = N(1, 1) and the variational parameters are initialized as μinit = 100, σinit = 0.2. The learning rate is set to be lψ = 1e 4. The learning rate for D-SVS is set to be lψ = 16. We use a multi-layer perception (MLP) with one hidden layer as the generator Gβ. The learning rate lβ is set to be 1e 3. |