HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
Authors: Andrey Zhmoginov, Mark Sandler, Maksym Vladymyrov
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present HYPERTRANSFORMER (HT) experimental results and discuss the implications of our empirical findings. ... Table 1. Comparison of HT with MAML++ and RFS on models of different sizes and different datasets: (a) 20-way OMNIGLOT, (b) 5-way MINIIMAGENET and (c) 5-way TIEREDIMAGENET. |
| Researcher Affiliation | Industry | Andrey Zhmoginov 1 Mark Sandler 1 Max Vladymyrov 1 ... 1Google Research. Correspondence to: Andrey Zhmoginov <azhmogin@google.com>. |
| Pseudocode | No | The paper describes algorithmic ideas but does not contain any structured pseudocode or algorithm blocks clearly labeled as such. |
| Open Source Code | Yes | The code for the paper can be found at https://github.com/google-research/googleresearch/tree/master/hypertransformer. |
| Open Datasets | Yes | For our experiments, we chose several most widely used few-shot datasets including OMNIGLOT, MINIIMAGENET and TIEREDIMAGENET. |
| Dataset Splits | No | The paper describes how tasks are sampled for training and evaluation (e.g., 'each training task t Ttrain is sampled by first randomly choosing n distinct classes Ct from a large training dataset and then sampling examples without replacement from these classes to generate τ(t) and Q(t).'), but it does not provide specific percentages or counts for training, validation, and test splits of the overall datasets like OMNIGLOT, MINIIMAGENET, and TIEREDIMAGENET. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | In all our experiments, we used gradient descent optimizer with a learning rate in the 0.01 to 0.02 range. Our early experiments with more advanced optimizers were unstable. We used a learning rate decay schedule, in which we reduced the learning rate by a factor of 0.95 every 10^5 learning steps. ... For all tasks except 5-shot MINIIMAGENET our Transformer had 3 layers... The 5-shot MINIIMAGENET and TIEREDIMAGENET results presented in Table 1 were obtained with a simplified Transformer model that had 1 layer... |