LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

Authors: Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that Li DAR significantly surpasses naive rank based approaches in its predictive power of optimal hyperparameters. Our proposed criterion presents a more robust and intuitive means of assessing the quality of representations within JE architectures, which we hope facilitates broader adoption of these powerful techniques in various domains.
Researcher Affiliation Industry Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Josh Susskind, Etai Littwin Apple Correspondence to: {vthilak, elittwin}@apple.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper refers to existing open-source implementations that were used (e.g., VISSL, VICReg's reference implementation, DINO's reference implementation, data2vec's reference implementation), but does not state that the code specific to this paper's methodology or experiments is open-sourced.
Open Datasets Yes We use the Imagenet-1k dataset (Russakovsky et al., 2015) for all experiments. We use the train split as the source dataset for pretraining and linear probing, and use the test split as the target dataset.
Dataset Splits Yes For each pretrained checkpoint, we train a linear probe on the train split, which we denote as the oracle, and record its test performance on the test split. ...The representations from the backbone are evaluated via standard linear probing by training a linear layer on Image Net-1k training split and calculating test accuracy on the validation split.
Hardware Specification No The paper mentions that "The feature extraction is done on one GPU while the metrics are implemented on the CPU" for runtime comparison, but it does not specify the model or detailed specifications of the GPU, CPU, or other hardware used for the main experiments.
Software Dependencies No The paper mentions various optimizers (Adam, LARS, SGD with Nesterov momentum) and refers to existing implementations for models (VICReg, DINO, SimCLR, data2vec, I-JEPA), but it does not specify version numbers for general software dependencies such as Python, PyTorch, TensorFlow, CUDA, or specific library versions.
Experiment Setup Yes We vary different hyperparameters per method. The varied hyperparameters range from optimization related ones such as learning rate, and weight decay, architecture specific hyperparameters such as softmax temperature, and data augmentation and masking based hyperparameters. ...Self-supervised training is run for 600 epochs with an effective batch size of 2048... The probe is optimized with Adam (Kingma & Ba, 2015) optimizer for 20 epochs with a starting learning rate of 0.01 and a step learning rate schedule where the base learning rate is dropped by a factor 10 after 15 epochs.