Big Self-Supervised Models are Strong Semi-Supervised Learners
Authors: Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey E. Hinton
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the effectiveness of our method on Image Net ILSVRC-2012 [21] with only 1% and 10% of the labeled images available. Our main findings and contributions can be summarized as follows: ... We combine these findings to achieve a new state-of-the-art in semi-supervised learning on Image Net as summarized in Figure 2. |
| Researcher Affiliation | Industry | Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton Google Research, Brain Team |
| Pseudocode | No | The paper describes the Sim CLRv2 framework and its components verbally and through equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and pretrained checkpoints are available at https://github.com/google-research/simclr. |
| Open Datasets | Yes | evaluate the proposed method on Image Net ILSVRC-2012 [21]. |
| Dataset Splits | Yes | While all 1.28 million images are available, only a randomly sub-sampled 1% (12811) or 10% (128116) of images are associated with labels. (...) For fine-tuning... we fine-tune for 60 epochs with 1% of labels, and 30 epochs with 10% of labels, as well as full Image Net labels. |
| Hardware Specification | Yes | For pretraining, similar to [1], we train our model on 128 Cloud TPUs, with a batch size of 4096 and global batch normalization [33], for total of 800 epochs. |
| Software Dependencies | No | The paper mentions software components like 'LARS optimizer', 'global batch normalization', 'Res Net', and 'Sim CLR' but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | For pretraining... train our model on 128 Cloud TPUs, with a batch size of 4096 and global batch normalization [33], for total of 800 epochs. The learning rate is linearly increased for the first 5% of epochs, reaching maximum of 6.4 (= 0.1 sqrt(Batch Size)), and then decayed with a cosine decay schedule. A weight decay of 1e 4 is used. (...) For fine-tuning... we use a much smaller learning rate, i.e. 0.16 (= 0.005 sqrt(Batch Size)) for standard Res Nets [25], and 0.064 (= 0.002 sqrt(Batch Size)) for larger Res Nets variants (...). A batch size of 1024 is used. Similar to [1], we fine-tune for 60 epochs with 1% of labels, and 30 epochs with 10% of labels, as well as full Image Net labels. For distillation... models are trained for 400 epochs. |