Self-supervised learning with rotation-invariant kernels

Authors: Léon Zheng, Gilles Puy, Elisa Riccietti, Patrick Perez, Rémi Gribonval

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerically, we show in a rigorous experimental setting with a separate validation set for hyperparameter tuning that our method yields fully competitive results compared to the state of the art, when choosing truncated kernels of the form K(u, v) = PL ℓ=0 bℓPℓ(q; u v), with L {2, 3}, bℓ 0 for ℓ {0, . . . , L}, where Pℓ(q; ) denotes the Legendre polynomial of order ℓ, dimension q. To our knowledge, this kernel choice has not been considered in previous self-supervision methods. Therefore, we introduce SFRIK (Sel F-supervised learning with Rotation-Invariant Kernels, pronounced like spheric ), which regularizes the embedding distribution to be close to the uniform distribution with respect to the MMD associated to such a truncated kernel, as summarized in Figure 1.
Researcher Affiliation Collaboration L eon Zheng1,2 Gilles Puy1 Elisa Riccietti2 Patrick P erez1 R emi Gribonval2 1valeo.ai, Paris, France 2Univ Lyon, Ens L, UCBL, CNRS, Inria, LIP, F-69342, LYON Cedex 07, France.
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code: https://github.com/valeoai/sfrik
Open Datasets Yes We first demonstrate numerically that the regularization loss (8) of SFRIK outperforms existing alternatives, in a rigorous experimental setting with a subset of Image Net-1000 (Deng et al., 2009) for pretraining and a separate validation set for hyperparameter tuning.
Dataset Splits Yes In contrast to the common practice in the literature where hyperparameters are directly selected on the evaluation dataset, we choose to tune hyperparameters on a separate validation set that consists of another 20% subset of the Image Net train set, and we finally report the evaluation results by linear probing on the usual Image Net validation set, which is never seen during hyperparameter tuning.
Hardware Specification Yes We measure the peak memory per GPU during pretraining on IN100% with a batch size of 2048 and the pretraining wall time of both methods on a 8 AMD Radeon Instinct MI50 32GB: at q = 8192, SFRIK is 8% faster than VICReg and needs 3% less memory per GPU; at q = 16384, SFRIK is 19% faster than VICReg and needs 8% less memory per GPU; at q = 32768, SFRIK is still 2% faster than VICReg run in the lower dimension 16384. It only requires 30.9GB per GPU while VICReg at q = 32768 needs more than the available memory.
Software Dependencies No Our experiments include the following image augmentations implemented by Py Torch (torchvision.transforms) and We then learn a linear SVM with LIBLINEAR (Fan et al., 2008) on top of these features. No specific version numbers are provided for these software components.
Experiment Setup Yes We fix the batch size at 2048, and tune the base learning rate and hyperparameters specific to each method s loss. We also compare different embedding dimension q {1024, 2048, 4096, 8192}. In order to perform an extensive hyperparameter tuning by grid search of each method for fair comparisons, we choose a smaller backbone and a reduced dataset for pretraining, i.e., we pretrain a Res Net-18 on IN20% for 100 epochs with all methods.