reproducibilityindex.ai

Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

Authors: Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate several use cases of the loss. As an analytical tool, it provides insights into the evolution of class similarity structures during learning. Surprisingly, we ﬁnd that maximizing the entanglement of representations of different classes in the hidden layers is beneﬁcial for discrimination in the ﬁnal layer, possibly because it encourages representations to identify class-independent similarity structures. Maximizing the soft nearest neighbor loss in the hidden layers leads not only to better-calibrated estimates of uncertainty on outlier data but also marginally improved generalization. Data that is not from the training distribution can be recognized by observing that in the hidden layers, it has fewer than the normal number of neighbors from the predicted class. We trained a convolutional network3 on MNIST, Fashion-MNIST and SVHN, as well as a Res Net4 on CIFAR10. Two variants of each model were trained with a different objective: (1) a baseline with cross-entropy only and (2) an entangled variant balancing both cross-entropy and the soft nearest neighbor loss as per Equation 3. As reported in Table 1, all entangled models marginally outperformed their non-entangled counterparts to some extent.
Researcher Affiliation	Industry	Nicholas Frosst 1 Nicolas Papernot 1 Geoffrey Hinton 1 1Google Brain. Correspondence to: N. Frosst <frosst@google.com>, N. Papernot <papernot@google.com>.
Pseudocode	No	The paper provides mathematical definitions and descriptions but does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	We open-sourced Tensor Flow code outlining the matrix operations needed to compute this loss efﬁciently.
Open Datasets	Yes	We trained a convolutional network3 on MNIST, Fashion-MNIST and SVHN, as well as a Res Net4 on CIFAR10.
Dataset Splits	Yes	Figure 7. Test accuracy as a function of the soft nearest neighbor hyper-parameter α for 64 training runs of a Res Net on CIFAR10. Each run is selected by Vizier (Golovin et al., 2017) to maximize validation accuracy by tuning the learning rate, SNNL hyper-parameter α, and temperature T.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions 'Tensor Flow code' but does not specify any version numbers for TensorFlow or other software dependencies.
Experiment Setup	Yes	The architecture we used was made up of two convolutional layers followed by three fully connected layers and a ﬁnal softmax layer. The network was trained with Adam at a learning rate of 1e-4, a batch size of 256 for 14000 steps. The Res Net v2 with 15 layers was trained for 106 epochs with a exponential decreasing learning rate starting at 0.4.