reproducibilityindex.ai

A Universal Representation Transformer Layer for Few-Shot Image Classification

Authors: Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show that URT sets a new state-of-the-art result on Meta-Dataset. Speciﬁcally, it achieves top-performance on the highest number of data sources compared to competing methods. We analyze variants of URT and present a visualization of the attention score heatmaps that sheds light on how the model performs cross-domain generalization.
Researcher Affiliation	Collaboration	Lu Liu1,2 , William Hamilton1,3 , Guodong Long2, Jing Jiang2, Hugo Larochelle1,4 1 Mila, 2 Australian AI Institute, UTS, 3 Mc Gill University, 4 Google Research, Brain Team Correspondence to lu.liu.cs@icloud.com
Pseudocode	Yes	Algorithm 1 Training of URT layer
Open Source Code	Yes	Our code is available at https://github.com/liulu112601/URT.
Open Datasets	Yes	We test our methods on the large-scale few-shot learning benchmark Meta-Dataset (Triantaﬁllou et al., 2020).
Dataset Splits	Yes	Meta-Dataset includes ten datasets (domains), with eight of them available for training. Additionally, each task sampled in the benchmark varies in the number of classes N, with each class also varying in the number of shots K. As in all few-shot learning benchmarks, the classes used for training and testing do not overlap. ... We chose the hyper-parameters based on the performance of the validation set.
Hardware Specification	Yes	Of note, the average inference time for URT is 0.04 second per task, compared to 0.43 for SUR, on a single V100.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow, CUDA version).
Experiment Setup	Yes	Then, we freeze the backbone and train the URT layer for 10,000 episodes, with an initial learning rate of 0.01 and a cosine learning rate scheduler. ... URT is trained with parameter weight decay of 1e-5 and with a regularization factor λ = 0.1. The number of heads (H in Equation 7), is set to 2 and the dimension of the keys and queries (l in Equation 4) is set to 1024.