Mitigating the Effect of Incidental Correlations on Part-based Learning

Authors: Gaurav Bhatt, Deepayan Das, Leonid Sigal, Vineeth N Balasubramanian

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (So TA) performance on few-shot learning tasks on benchmark datasets, including Mini Imagenet, Tiered Image Net, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the Image Net-9 dataset. The implementation is available on Git Hub: https://github.com/GauravBh1010tt/DPViT.git
Researcher Affiliation Academia Gaurav Bhatt 13, Deepayan Das2, Leonid Sigal13, Vineeth N Balasubramanian 2 1The University of British Columbia, 2Indian Institute of Technology Hyderabad 3 The Vector Institute, Canada
Pseudocode No The paper describes the proposed methodology using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The implementation is available on Git Hub: https://github.com/GauravBh1010tt/DPViT.git
Open Datasets Yes We evaluate the proposed approach on four datasets: Mini Image Net [35], Tiered Image Net [40], FC100 [37], and Image Net-9 [53].
Dataset Splits Yes For Mini Image Net, we use the data split proposed in [39], where the data samples are split into 64, 16, and 20 for training, validation, and testing, respectively. The Tiered Image Net [40] contains 608 classes divided into 351, 97, and 160 for metatraining, meta-validation, and meta-testing. On the other hand, FC100 [37] is a smaller resolution dataset (32 32) that contains 100 classes with class split as 60, 20, and 20.
Hardware Specification Yes The pre-training and fine-tuning are carried out on 4 A40 GPUs.
Software Dependencies No The paper mentions using methods from the iBOT paper but does not provide specific software names with version numbers for reproducibility (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes By default, we use the Vit-Small architecture, which consists of 21 million parameters. The patch size is set to 16 as our default configuration. The student and teacher networks have a shared projection head for the [cls] token output. The projection heads for both networks have an output dimension of 8192. We adopt a linear warm-up strategy for the learning rate over 10 epochs, starting from a base value of 5e-4, and then decaying it to 1e-5 using a cosine schedule. Similarly, the weight decay is decayed using a cosine schedule from 0.04 to 0.4. We employ a multi-crop strategy to improve performance with 2 global crops (224 224) and 10 local crops (96 96). The scale ranges for global and local crops are (0.4, 1.0) and (0.05, 0.4), respectively. Following [60], we use only the local crops for self-distillation with global crops from the same image. Additionally, we apply blockwise masking to the global crops inputted into the student network. The masking ratio is uniformly sampled from [0, 1, 0.5] with a probability of 0.5, and with a probability of 0.5, it is set to 0. Our batch size is 480, with a batch size per GPU of 120. DPVi T is pre-trained for 500 epochs for the given training set for all the datasets.