Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings

Authors: Yihao Xue, Ali Payani, Yu Yang, Baharan Mirzasoleiman

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments, conducted across various model architectures on 8 datasets featuring different types of distribution shifts, reveal that Mix Pro can outperform baselines by up to 7%, with only 2-4 target examples. and Empirically, we conduct extensive experiments on 8 datasets, including 3 subpopulation shift datasets Waterbirds (Sagawa et al., 2019), Urban Cars (Li et al., 2023), b FFHQ (Kim et al., 2021) and 5 domain generalization datasets Camelyon17(Koh et al., 2021), PACS (Li et al., 2017), VLCS (Fang et al., 2013), Office-Home (Venkateswara et al., 2017) and Terra Incognita (Beery et al., 2018).
Researcher Affiliation Collaboration 1Department of Computer Science, University of California, Los Angeles 2Cisco Systems Inc.
Pseudocode No The paper describes the Mix Pro method in formal steps (1) Mixing source & target and (2) Linear probe on mixed embeddings, using equations. However, it does not present a block labeled 'Algorithm' or 'Pseudocode'.
Open Source Code No The standard Image Net-pretrained Res Net 50 and (2) the Vi T-L/16 model pretrained with SWAG (Singh et al., 2022). These models are publicly available in Torch Vision.
Open Datasets Yes Empirically, we conduct extensive experiments on 8 datasets, including 3 subpopulation shift datasets Waterbirds (Sagawa et al., 2019), Urban Cars (Li et al., 2023), b FFHQ (Kim et al., 2021) and 5 domain generalization datasets Camelyon17(Koh et al., 2021), PACS (Li et al., 2017), VLCS (Fang et al., 2013), Office-Home (Venkateswara et al., 2017) and Terra Incognita (Beery et al., 2018).
Dataset Splits Yes Therefore, to evaluate if the methods can operate effectively in a true few-shot scenario without additional data, we employ standard k-fold cross-validation using the limited target data available for hyperparameter selection. Considering the smallest case in our experiments, where the target data size is only 4 (2 per class and 2 classes), we set k = 2 to ensure that each fold has at least one data point per class.
Hardware Specification No The paper does not specify the hardware used for experiments, such as specific GPU or CPU models, or details about computational resources like cloud instances.
Software Dependencies No The paper mentions software like Torch Vision and Adam optimizer, but does not provide specific version numbers for any software dependencies required for reproducibility.
Experiment Setup Yes For all methods, following (Chen et al., 2023), we employ the Adam optimizer (Kingma & Ba, 2014) with a batch size of 64 and train for 100 epochs. and Table 1. Hyperparameter range for each method. m.s. represents method-specific . PRO2 DFR Mixup Teney et al. (2022) Mix Pro lr {0.1, 0.01, 0.001} wd {0.1, 0.01, 0.001} m.s. d {1, 22,24,26,28,210} None α {0.2,0.4,22,23,25} λ {5e 3,1e 2,0.1,1,5} s {0.1, 0.3, 0.5, 0.7, 0.9}