Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Authors: Donggyun Kim, Jinwoo Kim, Seongwoong Cho, Chong Luo, Seunghoon Hong

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision.
Researcher Affiliation Collaboration Donggyun Kim1, Jinwoo Kim1, Seongwoong Cho1, Chong Luo2, Seunghoon Hong1 1 School of Computing, KAIST 2 Microsoft Research Asia
Pseudocode No The paper describes its methods through text and mathematical equations (e.g., Eq. 3, 4, 5, 8, 9, 10, 11) but does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/Git Gyun/visual_token_matching.
Open Datasets Yes We construct a variant of the Taskonomy dataset (Zamir et al., 2018) to simulate fewshot learning of unseen dense prediction tasks. Taskonomy contains indoor images with various annotations, where we choose ten dense prediction tasks of diverse semantics and output dimensions: semantic segmentation (SS), surface normal (SN), Euclidean distance (ED), Z-buffer depth (ZD), texture edge (TE), occlusion edge (OE), 2D keypoints (K2), 3D keypoints (K3), reshading (RS), and principal curvature (PC),2.
Dataset Splits Yes We use the train/val/test split of the Taskonomy-tiny partition provided by Zamir et al. (2018). We use the train and val split for training and early-stopping, respectively, and use the muleshoe building included in the test split for evaluation.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only mentioning general terms like 'GPU' in section C.2.2 MACs of Ours and DPT on a single-query inference for a single-channel task.
Software Dependencies No The paper mentions using 'Adam optimizer' and 'thop' library (with a link to its GitHub repository) but does not provide specific version numbers for these or other key software components like Python or PyTorch.
Experiment Setup Yes We train all models with 300,000 iterations using the Adam optimizer (Kingma & Ba, 2015), and use poly learning rate schedule (Liu et al., 2015) with base learning rates 10 5 for pretrained parameters and 10 4 for parameters trained from scratch. The models are early-stopped based on the validation metric. At each episodic training of iteration, we sample a batch of episodes with size 8. In each episode, we construct a 5-channel task from the training tasks Ttrain by first splitting all channels of training tasks and randomly sample 5 channels among them. Then support and query sets are sampled for the selected channels, where we use support and query size of 4 for Ours and DGP, while using 1 for HSNet and VAT as they only supports 1-shot training. ... Finally, we apply Dropout (Srivastava et al., 2014) with rate 0.1 in the attention scores.