Mitigating the Effect of Incidental Correlations on Part-based Learning
Authors: Gaurav Bhatt, Deepayan Das, Leonid Sigal, Vineeth N Balasubramanian
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (So TA) performance on few-shot learning tasks on benchmark datasets, including Mini Imagenet, Tiered Image Net, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the Image Net-9 dataset. The implementation is available on Git Hub: https://github.com/GauravBh1010tt/DPViT.git |
| Researcher Affiliation | Academia | Gaurav Bhatt 13, Deepayan Das2, Leonid Sigal13, Vineeth N Balasubramanian 2 1The University of British Columbia, 2Indian Institute of Technology Hyderabad 3 The Vector Institute, Canada |
| Pseudocode | No | The paper describes the proposed methodology using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The implementation is available on Git Hub: https://github.com/GauravBh1010tt/DPViT.git |
| Open Datasets | Yes | We evaluate the proposed approach on four datasets: Mini Image Net [35], Tiered Image Net [40], FC100 [37], and Image Net-9 [53]. |
| Dataset Splits | Yes | For Mini Image Net, we use the data split proposed in [39], where the data samples are split into 64, 16, and 20 for training, validation, and testing, respectively. The Tiered Image Net [40] contains 608 classes divided into 351, 97, and 160 for metatraining, meta-validation, and meta-testing. On the other hand, FC100 [37] is a smaller resolution dataset (32 32) that contains 100 classes with class split as 60, 20, and 20. |
| Hardware Specification | Yes | The pre-training and fine-tuning are carried out on 4 A40 GPUs. |
| Software Dependencies | No | The paper mentions using methods from the iBOT paper but does not provide specific software names with version numbers for reproducibility (e.g., Python version, PyTorch/TensorFlow version, CUDA version). |
| Experiment Setup | Yes | By default, we use the Vit-Small architecture, which consists of 21 million parameters. The patch size is set to 16 as our default configuration. The student and teacher networks have a shared projection head for the [cls] token output. The projection heads for both networks have an output dimension of 8192. We adopt a linear warm-up strategy for the learning rate over 10 epochs, starting from a base value of 5e-4, and then decaying it to 1e-5 using a cosine schedule. Similarly, the weight decay is decayed using a cosine schedule from 0.04 to 0.4. We employ a multi-crop strategy to improve performance with 2 global crops (224 224) and 10 local crops (96 96). The scale ranges for global and local crops are (0.4, 1.0) and (0.05, 0.4), respectively. Following [60], we use only the local crops for self-distillation with global crops from the same image. Additionally, we apply blockwise masking to the global crops inputted into the student network. The masking ratio is uniformly sampled from [0, 1, 0.5] with a probability of 0.5, and with a probability of 0.5, it is set to 0. Our batch size is 480, with a batch size per GPU of 120. DPVi T is pre-trained for 500 epochs for the given training set for all the datasets. |