Fix your classifier: the marginal value of training the last weight layer
Authors: Elad Hoffer, Itay Hubara, Daniel Soudry
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTAL RESULTS Table 1: Validation accuracy results on learned vs. fixed classifier We trained a residual network of He et al. (2016) on the Cifar10 dataset. |
| Researcher Affiliation | Academia | Elad Hoffer, Itay Hubara, Daniel Soudry Department of Electrical Engineering Technion Haifa, 320003, Israel elad.hoffer, itay.hubara, daniel.soudry@gmail.com |
| Pseudocode | No | The information is insufficient. The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Table 1 summarizes our fixed-classifier results on convolutional networks, comparing to originally reported results. We offer our drop-in replacement for learned classifier that can be used to train models with fixed classifiers and replicate our results1. 1Code is available at https://github.com/eladhoffer/fix_your_classifier |
| Open Datasets | Yes | We used the well known Cifar10 and Cifar100 datasets by Krizhevsky (2009) as an initial test-bed to explore the idea of a fixed classifier. In order to validate our results on a more challenging dataset, we used the Imagenet dataset introduced by Deng et al. (2009). |
| Dataset Splits | Yes | Cifar10 is an image classification benchmark dataset containing 50, 000 training images and 10, 000 test images. The results shown in figure 2 demonstrate that although the training error is considerably lower for the network with learned classifier, both models achieve the same classification accuracy on the validation set. |
| Hardware Specification | No | The information is insufficient. The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The information is insufficient. The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We used a network of depth 56 and the same hyper-parameters used in the original work. We compared two variants: the original model with a learned classifier, and our version, where a fixed transformation is used. In all experiments the α scale parameter was regularized with the same weight decay coefficient used on original classifier. |