Why Do Better Loss Functions Lead to Less Transferable Features?
Authors: Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on Image Net. We show that many objectives lead to statistically significant improvements in Image Net accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks. |
| Researcher Affiliation | Collaboration | 1Google Research, Toronto 2University of Michigan |
| Pseudocode | No | The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | trained 8 Res Net-50 [29, 27] models on the Image Net ILSVRC 2012 dataset [20, 57]. To tune loss hyperparameters and the epoch for early stopping, we performed 3 training runs per hyperparameter configuration where we held out a validation set of 50,046 Image Net training examples. |
| Dataset Splits | Yes | To tune loss hyperparameters and the epoch for early stopping, we performed 3 training runs per hyperparameter configuration where we held out a validation set of 50,046 Image Net training examples. |
| Hardware Specification | Yes | All models were trained on Google’s Cloud TPUs, primarily on Cloud TPU v3-8 devices. |
| Software Dependencies | No | The paper states: 'Our models were implemented in JAX [16] and Flax [22].' However, it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We carefully tune hyperparameters of each loss function... We provide hyperparameters in Appendix A.1. ... We provide further details regarding the experimental setup in Appendix A. |