CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping
Authors: Tim Lebailly, Thomas Stegmüller, Behzad Bozorgtabar, Jean-Philippe Thiran, Tinne Tuytelaars
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first verify that the proposed pretraining method aligns well with the objective of in-context learning via nearest neighbor retrieval. We then show that in doing so, we do not compromise the performance on standard evaluations. General details on the experimental setup can be found in Appendix A. |
| Researcher Affiliation | Academia | Tim Lebailly1 Thomas Stegm uller2 Behzad Bozorgtabar2,3 Jean-Philippe Thiran2,3 Tinne Tuytelaars1 1KU Leuven 2EPFL 3CHUV |
| Pseudocode | No | The paper describes algorithms in text, particularly in Appendix B, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Our code and pretrained models are publicly available at https://github.com/tileb1/Cr IBo. |
| Open Datasets | Yes | Our pretraining datasets include COCO (Lin et al., 2014) and Image Net-1k (Deng et al., 2009). |
| Dataset Splits | Yes | The k-NN classifier is fitted on the local representations of a uniformly sub-sampled set of training images and evaluated on all the patches from the validation set of images. We report the m Io U scores on Pascal VOC 2012 (Everingham et al.) and ADE20K (Zhou et al., 2017). The validation set incorporates 1,449 images for Pascal VOC 2012 and The training set comprises 20,210 images, and the validation set consists of 2,000 images for ADE20K. |
| Hardware Specification | Yes | Experiments are run on a single node with 4x AMD MI250x (2 compute die per GPU i.e., worldsize = 8) with a memory usage of 43.5 GB per compute die. |
| Software Dependencies | No | The paper mentions using `MMSegmentation (Contributors, 2020)` and `Adam optimizer (Kingma & Ba, 2014)`, but does not provide specific version numbers for software libraries like PyTorch, CUDA, or for MMSegmentation itself. |
| Experiment Setup | Yes | The Vi T-small (Vi T-S/16) is trained for 800 epochs, while the Vi T-base (Vi T-B/16) is trained for 400 epochs. Pretrainings on COCO use a batchsize of 256 while pretrainings on Image Net-1k use a batchsize of 1024. Learning rate, weight-decay, and other optimization-related hyperparameters are exactly the same as in DINO (Caron et al., 2021). Results reported in tables using Vi T-S/16 (apart from the gridsearch) are based on the following hyperparameters: (λpos, S, K) = (1.0, 25k, 32) and (λpos, S, K) = (2.0, 25k, 64) for pretrainings on Image Net-1K and COCO, respectively. |