Automatic Shortcut Removal for Self-Supervised Representation Learning

Authors: Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table 1 shows evaluation results across tasks and datasets. For all tested tasks, adding the lens leads to a significant improvement over the baseline. Further, the lens outperforms adversarial training using the fast gradient sign method (FGSM; Goodfellow et al., 2015; details in Appendix ??).
Researcher Affiliation Industry Matthias Minderer 1 2 Olivier Bachem 1 Neil Houlsby 1 Michael Tschannen 1 1Google Research, Brain Team, Z urich, Switzerland 2Work done as part of the Google AI Residency. Correspondence to: Matthias Minderer <mjlm@google.com>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology it describes.
Open Datasets Yes Self-supervised training is performed on Image Net, which contains 1.3 million images, each belonging to one of 1000 object categories. Unless stated otherwise, we use the same preprocessing operations and batch size as Kolesnikov et al. (2019) for the respective tasks. To mitigate distribution shift between raw and lens-processed images, we feed both the batch of lens-processed and the raw images to the feature extraction network (Kurakin et al. (2016) similarly feed processed and raw images for adversarial training).
Dataset Splits Yes We report top-1 classification accuracy on the Image Net validation set. In addition, to measure how well the learned representations transfer to unseen data, we also report downstream top-1 accuracy on the Places205 dataset.
Hardware Specification Yes Training was performed on 128 TPU v3 cores for Rotation and Exemplar and 32 TPU v3 cores for Relative patch location and Jigsaw.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes Feature extractor and lens are trained synchronously using the Adam optimizer with β1 = 0.1, β2 = 10 3 and ϵ = 10 7 for 35 epochs. The learning rate is linearly ramped up from zero to 10 4 in the first epoch, stays at 10 4 until the end of the 32nd epoch, and is then linearly decayed to zero.