Automatic Shortcut Removal for Self-Supervised Representation Learning
Authors: Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Table 1 shows evaluation results across tasks and datasets. For all tested tasks, adding the lens leads to a significant improvement over the baseline. Further, the lens outperforms adversarial training using the fast gradient sign method (FGSM; Goodfellow et al., 2015; details in Appendix ??). |
| Researcher Affiliation | Industry | Matthias Minderer 1 2 Olivier Bachem 1 Neil Houlsby 1 Michael Tschannen 1 1Google Research, Brain Team, Z urich, Switzerland 2Work done as part of the Google AI Residency. Correspondence to: Matthias Minderer <mjlm@google.com>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology it describes. |
| Open Datasets | Yes | Self-supervised training is performed on Image Net, which contains 1.3 million images, each belonging to one of 1000 object categories. Unless stated otherwise, we use the same preprocessing operations and batch size as Kolesnikov et al. (2019) for the respective tasks. To mitigate distribution shift between raw and lens-processed images, we feed both the batch of lens-processed and the raw images to the feature extraction network (Kurakin et al. (2016) similarly feed processed and raw images for adversarial training). |
| Dataset Splits | Yes | We report top-1 classification accuracy on the Image Net validation set. In addition, to measure how well the learned representations transfer to unseen data, we also report downstream top-1 accuracy on the Places205 dataset. |
| Hardware Specification | Yes | Training was performed on 128 TPU v3 cores for Rotation and Exemplar and 32 TPU v3 cores for Relative patch location and Jigsaw. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | Feature extractor and lens are trained synchronously using the Adam optimizer with β1 = 0.1, β2 = 10 3 and ϵ = 10 7 for 35 epochs. The learning rate is linearly ramped up from zero to 10 4 in the first epoch, stays at 10 4 until the end of the 32nd epoch, and is then linearly decayed to zero. |