Self-Supervised Learning with Kernel Dependence Maximization
Authors: Yazhe Li, Roman Pogodin, Danica J. Sutherland, Arthur Gretton
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our experimental setup, where we assess the performance of the representation learned with SSL-HSIC both with and without a target network. For evaluation, we retain the backbone as a feature extractor for downstream tasks. We evaluate the representation on various downstream tasks including classification, object segmentation, object detection and depth estimation. |
| Researcher Affiliation | Collaboration | Yazhe Li Deep Mind and Gatsby Unit, UCL yazhe@google.com Roman Pogodin Gatsby Unit, UCL roman.pogodin.17@ucl.ac.uk Danica J. Sutherland UBC and Amii dsuth@cs.ubc.ca Arthur Gretton Gatsby Unit, UCL arthur.gretton@gmail.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/deepmind/ssl_hsic. |
| Open Datasets | Yes | For evaluation, we retain the backbone as a feature extractor for downstream tasks. We evaluate the representation on various downstream tasks including classification, object segmentation, object detection and depth estimation. |
| Dataset Splits | Yes | Table 1 reports the top-1 and top-5 accuracies obtained with SSL-HSIC on Image Net validation set, and compares to previous self-supervised learning methods. |
| Hardware Specification | Yes | We train the model with a batch size of 4096 on 128 Cloud TPU v4 cores. |
| Software Dependencies | No | The paper mentions 'LARS optimizer' but does not specify any software names with version numbers for reproducibility. |
| Experiment Setup | Yes | The output of the encoder is a 2048-dimension embedding vector, which is the representation used for downstream tasks. As in BYOL [25], our projector g and predictor q networks are 2-layer MLPs with 4096 hidden dimensions and 256 output dimensions. The outputs of the networks are batch-normalized and rescaled to unit norm before computing the loss. We use an inverse multiquadric kernel (IMQ) for the latent representation (approximated with 512 random Fourier features that are resampled at each step; see Appendix C for details) and a linear kernel for labels. γ in (4) is set to 3. |