You Don’t Need Domain-Specific Data Augmentations When Scaling Self-Supervised Learning

Authors: Théo Moutakanni, Maxime Oquab, Marc Szafraniec, Maria Vakalopoulou, Piotr Bojanowski

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we challenge the importance of invariance and data-augmentation in JEAs at scale. By running a case-study on a recent SSL foundation model DINOv2 we show that strong image representations can be obtained with JEAs and only cropping without resizing provided the training data is large enough, reaching state-of-the-art results and using the least amount of augmentation in the literature. Through this study, we also discuss the impact of compute constraints on the outcomes of experimental deep learning research, showing that they can lead to very different conclusions.
Researcher Affiliation Collaboration Théo Moutakanni FAIR at Meta MICS, Université Paris-Saclay theomoutakanni@meta.com Maxime Oquab FAIR at Meta Marc Szafraniec FAIR at Meta Maria Vakalopoulou MICS, Centrale Supélec, Université Paris-Saclay Piotr Bojanowski FAIR at Meta
Pseudocode No Insufficient information. The paper describes algorithms and loss functions but does not present them in a pseudocode block or algorithm format.
Open Source Code Yes We refer to the official DINOv2 repository for the exact implementation details.1 https://github.com/facebookresearch/dinov2 (Apache-2.0 license)
Open Datasets Yes For our study, we use the standard Image Net-1k [41] and Image Net22k [15] datasets, as well as the LVD-142M dataset originally used in DINOv2 [35].
Dataset Splits No Insufficient information. The paper mentions training and testing, but no explicit details about a validation split for hyperparameter tuning or early stopping are provided within the main text.
Hardware Specification Yes The pre-training code performs 100 epochs in 27 hours on 5 compute nodes of 8 A100-80GB GPUs each, where 1 epoch is set to 1250 iterations.
Software Dependencies No Insufficient information. The paper refers to the DINOv2 algorithm and repository but does not list specific software dependencies with version numbers (e.g., PyTorch version, CUDA version).
Experiment Setup Yes Hyperparameters and training regimes. The original DINOv2 repository proposes two sets of hyperparameters for training SSL models. The first set (that we refer to as the low-compute regime) corresponds to a setup for fast experimental iterations, designed to run for 100 epochs (125k iterations) with a batch size of 2048. This setup is optimized for performance on the Image Net-1k dataset (corresponding to the low-data regime). The second set (high-compute regime) is designed for longer training runs of 500 epochs (625k iterations) and is optimized for performance on larger datasets, such as Image Net-22k (the high-data regime).