Empirical Bayes Transductive Meta-Learning with Synthetic Gradients
Authors: Shell Xu Hu, Pablo Garcia Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski, Neil Lawrence, Andreas Damianou
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results on the Mini-Image Net and CIFAR-FS benchmarks for episodic few-shot classification outperform previous state-of-the-art methods. Besides, we conduct two zero-shot learning experiments to further explore the potential of the synthetic gradient. |
| Researcher Affiliation | Collaboration | 1École des Ponts Paris Tech Champs-sur-Marne, France {xu.hu, yang.xiao, xi.shen}@enpc.fr 2Amazon Cambridge, United Kingdom {morepabl, damianou}@amazon.com 3Swiss Data Science Center Lausanne, Switzerland guillaume.obozinski@epfl.ch 4University of Cambridge Cambridge, United Kingdom ndl21@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Variational inference with synthetic gradients for empirical Bayes |
| Open Source Code | Yes | Our code is available at https://github.com/amzn/xfer. |
| Open Datasets | Yes | Mini Image Net is proposed by Vinyals et al. (2016), which contains 100 classes, split into 64 training classes, 16 validation classes and 20 testing classes; each image is of size 84 84. CIFAR-FS is proposed by Bertinetto et al. (2018), which is created by dividing the original CIFAR-100 into 64 training classes, 16 validation classes and 20 testing classes; each image is of size 32 32. |
| Dataset Splits | Yes | Mini Image Net... split into 64 training classes, 16 validation classes and 20 testing classes; each image is of size 84 84. CIFAR-FS... created by dividing the original CIFAR-100 into 64 training classes, 16 validation classes and 20 testing classes; each image is of size 32 32. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. Only general aspects of the network architecture are described, not the underlying hardware. |
| Software Dependencies | No | The paper mentions optimizers like SGD and ADAM but does not provide specific software dependency details such as library names with version numbers (e.g., PyTorch, TensorFlow, or Python versions). |
| Experiment Setup | Yes | We run SGD with batch size 8 for 40000 steps, where the learning rate is fixed to 10-3. During training, we freeze the feature network. To select the best hyper-parameters, we sample 1000 tasks from the validation classes and reuse them at each training epoch. |