reproducibilityindex.ai

Self-Supervised Learning with Kernel Dependence Maximization

Authors: Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present our experimental setup, where we assess the performance of the representation learned with SSL-HSIC both with and without a target network. For evaluation, we retain the backbone as a feature extractor for downstream tasks. We evaluate the representation on various downstream tasks including classiﬁcation, object segmentation, object detection and depth estimation.
Researcher Affiliation	Collaboration	Yazhe Li Deep Mind and Gatsby Unit, UCL yazhe@google.com Roman Pogodin Gatsby Unit, UCL roman.pogodin.17@ucl.ac.uk Danica J. Sutherland UBC and Amii dsuth@cs.ubc.ca Arthur Gretton Gatsby Unit, UCL arthur.gretton@gmail.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code is available at https://github.com/deepmind/ssl_hsic.
Open Datasets	Yes	For evaluation, we retain the backbone as a feature extractor for downstream tasks. We evaluate the representation on various downstream tasks including classiﬁcation, object segmentation, object detection and depth estimation.
Dataset Splits	Yes	Table 1 reports the top-1 and top-5 accuracies obtained with SSL-HSIC on Image Net validation set, and compares to previous self-supervised learning methods.
Hardware Specification	Yes	We train the model with a batch size of 4096 on 128 Cloud TPU v4 cores.
Software Dependencies	No	The paper mentions 'LARS optimizer' but does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	The output of the encoder is a 2048-dimension embedding vector, which is the representation used for downstream tasks. As in BYOL [25], our projector g and predictor q networks are 2-layer MLPs with 4096 hidden dimensions and 256 output dimensions. The outputs of the networks are batch-normalized and rescaled to unit norm before computing the loss. We use an inverse multiquadric kernel (IMQ) for the latent representation (approximated with 512 random Fourier features that are resampled at each step; see Appendix C for details) and a linear kernel for labels. γ in (4) is set to 3.