reproducibilityindex.ai

Why Do Better Loss Functions Lead to Less Transferable Features?

Authors: Simon Kornblith, Ting Chen, Honglak Lee, Mohammad Norouzi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper studies how the choice of training objective affects the transferability of the hidden representations of convolutional neural networks trained on Image Net. We show that many objectives lead to statistically signiﬁcant improvements in Image Net accuracy over vanilla softmax cross-entropy, but the resulting ﬁxed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully ﬁne-tuned on the new tasks.
Researcher Affiliation	Collaboration	1Google Research, Toronto 2University of Michigan
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	trained 8 Res Net-50 [29, 27] models on the Image Net ILSVRC 2012 dataset [20, 57]. To tune loss hyperparameters and the epoch for early stopping, we performed 3 training runs per hyperparameter conﬁguration where we held out a validation set of 50,046 Image Net training examples.
Dataset Splits	Yes	To tune loss hyperparameters and the epoch for early stopping, we performed 3 training runs per hyperparameter conﬁguration where we held out a validation set of 50,046 Image Net training examples.
Hardware Specification	Yes	All models were trained on Google’s Cloud TPUs, primarily on Cloud TPU v3-8 devices.
Software Dependencies	No	The paper states: 'Our models were implemented in JAX [16] and Flax [22].' However, it does not provide specific version numbers for these software components.
Experiment Setup	Yes	We carefully tune hyperparameters of each loss function... We provide hyperparameters in Appendix A.1. ... We provide further details regarding the experimental setup in Appendix A.