No Reason for No Supervision: Improved Generalization in Supervised Models

Authors: Mert Bülent Sarıyıldız, Yannis Kalantidis, Karteek Alahari, Diane Larlus

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively analyze supervised training using multi-scale crops for data augmentation and an expendable projector head, and reveal that the design of the projector allows us to control the trade-off between performance on the training task and transferability. We perform an extensive analysis on how each component affects the learned representations... In Sec. 4.1, we exhaustively study the design of the main components of our setup... Finally, in Sec. 4.3 we plot the performance of multiple variants of the proposed training setup on the training-versus-transfer performance plane, empirically verifying its superiority over the previous state of the art.
Researcher Affiliation Collaboration Mert Bulent Sariyildiz1,2 Yannis Kalantidis1 Karteek Alahari2 Diane Larlus1 1 NAVER LABS Europe 2 Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code and pretrained models: https://europe.naverlabs.com/t-rex
Open Datasets Yes All our models are trained on the training set of Image Net-1K (IN1K) (Russakovsky et al., 2015).
Dataset Splits Yes All our models are trained on the training set of Image Net-1K (IN1K) (Russakovsky et al., 2015). ... We measure performance on the training task by evaluating classification accuracy on the IN1K validation set. ... When a val split is not provided for a dataset, we randomly split its train set into two, following the size of train and val splits from either Feng et al. (2022) or Grill et al. (2020). We also created different train/val splits when tuning hyper-parameters with different seeds, thus further increasing the robustness of our scores.
Hardware Specification Yes fθ is a Res Net50 (He et al., 2016) encoder, trained for 100 epochs with mixed precision in Py Torch (Paszke et al., 2019) using 4 GPUs where batch norm layers are synchronized. ... Training one of our models takes up to 3 days with 4 V100 GPUs depending on its projector configuration.
Software Dependencies No The paper mentions software like PyTorch, Scikit-learn, Optuna, and torchvision, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Table 3: Hyper-parameters for training our models on IN1K. Hyper-parameters shared by all models are given on the top part while the ones specific to t-Re X and t-Re X* are shown on the bottom part. Optimizer SGD, Base learning rate 0.1, Learning rate warmup Linear, 10 epochs, Batch size total 256, Epochs 100, τ in Eqs. (1) and (2) 0.1, Number of Global crops (Mg) 1, Number of Local crops (Ml) 8, Global crop resolution 224.