Masked Unsupervised Self-training for Label-free Image Classification
Authors: Junnan Li, Silvio Savarese, Steven Hoi
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of MUST on a variety of downstream tasks, where it improves upon CLIP by a large margin. MUST also outperforms supervised fewshot adaptation methods. It achieves a top-1 accuracy of 77.7% on Image Net using Vi T-B, +9.4% higher than CLIP, and +6.2% higher than 16-shot CLIP adaptation. Our code is available at https://github.com/salesforce/MUST. ... We validate the efficacy of MUST on 8 image classification tasks across a variety of domains, showing significant improvement over CLIP (Radford et al., 2021). ... We perform experiments on 8 image classification datasets which span many different domains... |
| Researcher Affiliation | Industry | Junnan Li, Silvio Savarese, Steven Hoi Salesforce Research {junnan.li,ssavarese,shoi}@salesforce.com |
| Pseudocode | No | The paper describes the proposed method, MUST, using textual explanations and figures (Figure 1 and Figure 2) for illustration. However, it does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our code is available at https://github.com/salesforce/MUST. |
| Open Datasets | Yes | Image Net (Deng et al., 2009), SUN397 (Xiao et al., 2010), Food101 (Bossard et al., 2014), GTSRB (Stallkamp et al., 2011), DTD (Cimpoi et al., 2014), UCF101 (Soomro et al., 2012), Oxford Pets (Parkhi et al., 2012), Caltech101 (Binh, 2011). |
| Dataset Splits | No | The paper lists 'Train size' and 'Test size' for each dataset in Table 1 but does not provide specific details on validation splits (e.g., percentages, sample counts, or predefined validation sets) needed for reproduction. It mentions 'validation images' in the context of qualitative analysis (Figure 3, 4) but not for dataset splitting or size. |
| Hardware Specification | Yes | We use 16 A100 GPUs |
| Software Dependencies | No | The paper mentions using specific pre-trained models ('Vi T-B/16 and Vi T-L/14') and an optimizer ('Adam W'), but it does not specify software dependencies like programming language versions (e.g., Python 3.x) or library versions (e.g., PyTorch 1.x) with specific version numbers. |
| Experiment Setup | Yes | During finetuning, we use Adam W (Loshchilov & Hutter, 2017) optimizer with a weight decay of 0.05. We employ a cosine learning rate schedule without any warmup. ... The batch size is 1024 for Vi T-B/16 and 512 for Vi T-L/14, and the learning rate is scaled linearly with the batch size (lr = base_lr batchsize/256). ... Table 10 provides more details of the hyperparameters used for each downstream task, including the pseudo-label threshold, mask patch size, mask ratio, etc. |