Masked Unsupervised Self-training for Label-free Image Classification

Authors: Junnan Li, Silvio Savarese, Steven Hoi

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin. MUST also outperforms supervised fewshot adaptation methods. It achieves a top-1 accuracy of 77.7% on Image Net using Vi T-B, +9.4% higher than CLIP, and +6.2% higher than 16-shot CLIP adaptation. Our code is available at https://github.com/salesforce/MUST. ... We validate the efficacy of MUST on 8 image classification tasks across a variety of domains, showing significant improvement over CLIP (Radford et al., 2021). ... We perform experiments on 8 image classification datasets which span many different domains...
Researcher Affiliation Industry Junnan Li, Silvio Savarese, Steven Hoi Salesforce Research {junnan.li,ssavarese,shoi}@salesforce.com
Pseudocode No The paper describes the proposed method, MUST, using textual explanations and figures (Figure 1 and Figure 2) for illustration. However, it does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Our code is available at https://github.com/salesforce/MUST.
Open Datasets Yes Image Net (Deng et al., 2009), SUN397 (Xiao et al., 2010), Food101 (Bossard et al., 2014), GTSRB (Stallkamp et al., 2011), DTD (Cimpoi et al., 2014), UCF101 (Soomro et al., 2012), Oxford Pets (Parkhi et al., 2012), Caltech101 (Binh, 2011).
Dataset Splits No The paper lists 'Train size' and 'Test size' for each dataset in Table 1 but does not provide specific details on validation splits (e.g., percentages, sample counts, or predefined validation sets) needed for reproduction. It mentions 'validation images' in the context of qualitative analysis (Figure 3, 4) but not for dataset splitting or size.
Hardware Specification Yes We use 16 A100 GPUs
Software Dependencies No The paper mentions using specific pre-trained models ('Vi T-B/16 and Vi T-L/14') and an optimizer ('Adam W'), but it does not specify software dependencies like programming language versions (e.g., Python 3.x) or library versions (e.g., PyTorch 1.x) with specific version numbers.
Experiment Setup Yes During finetuning, we use Adam W (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.05. We employ a cosine learning rate schedule without any warmup. ... The batch size is 1024 for Vi T-B/16 and 512 for Vi T-L/14, and the learning rate is scaled linearly with the batch size (lr = base_lr batchsize/256). ... Table 10 provides more details of the hyperparameters used for each downstream task, including the pseudo-label threshold, mask patch size, mask ratio, etc.