Fix your classifier: the marginal value of training the last weight layer

Authors: Elad Hoffer, Itay Hubara, Daniel Soudry

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 EXPERIMENTAL RESULTS Table 1: Validation accuracy results on learned vs. fixed classifier We trained a residual network of He et al. (2016) on the Cifar10 dataset.
Researcher Affiliation Academia Elad Hoffer, Itay Hubara, Daniel Soudry Department of Electrical Engineering Technion Haifa, 320003, Israel elad.hoffer, itay.hubara, daniel.soudry@gmail.com
Pseudocode No The information is insufficient. The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes Table 1 summarizes our fixed-classifier results on convolutional networks, comparing to originally reported results. We offer our drop-in replacement for learned classifier that can be used to train models with fixed classifiers and replicate our results1. 1Code is available at https://github.com/eladhoffer/fix_your_classifier
Open Datasets Yes We used the well known Cifar10 and Cifar100 datasets by Krizhevsky (2009) as an initial test-bed to explore the idea of a fixed classifier. In order to validate our results on a more challenging dataset, we used the Imagenet dataset introduced by Deng et al. (2009).
Dataset Splits Yes Cifar10 is an image classification benchmark dataset containing 50, 000 training images and 10, 000 test images. The results shown in figure 2 demonstrate that although the training error is considerably lower for the network with learned classifier, both models achieve the same classification accuracy on the validation set.
Hardware Specification No The information is insufficient. The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The information is insufficient. The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes We used a network of depth 56 and the same hyper-parameters used in the original work. We compared two variants: the original model with a learned classifier, and our version, where a fixed transformation is used. In all experiments the α scale parameter was regularized with the same weight decay coefficient used on original classifier.