Zero-bias autoencoders and the benefits of co-adapting features
Authors: Kishore Konda, Roland Memisevic, and David Krueger
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work is motivated by the empirical observation that across a wide range of applications, hidden biases, bk, tend to take on large negative values when training an autoencoder with one of the mentioned regularization schemes. In Figure 1 we confirm this finding, and we show that it is still true when features represent whole CIFAR-10 images (rather than a bag of features). The figure shows the classification performance of a standard contractive autoencoder with sigmoid hidden units trained on the permutation-invariant CIFAR-10 training dataset (ie. using the whole images not patches for training), using a linear classifier applied to the hidden activations. |
| Researcher Affiliation | Academia | Kishore Konda Goethe University Frankfurt Germany konda.kishorereddy@gmail.com Roland Memisevic University of Montreal Canada roland.memisevic@umontreal.ca David Krueger University of Montreal Canada david.krueger@umontreal.ca |
| Pseudocode | No | The paper describes algorithms and mathematical formulations but does not include any clearly labeled pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | An example implementation of the zero-bias autoencoder in python is available at http://www.iro. umontreal.ca/ memisevr/code/zae/. |
| Open Datasets | Yes | We chose the CIFAR-10 dataset (Krizhevsky & Hinton (2009)). It contains color images of size 32 × 32 pixels that are assigned to 10 different classes. The number of samples for training is 50, 000 and for testing is 10, 000. We used the recognition pipeline proposed in Le et al. (2011); Konda et al. (2014) and evaluated it on the Hollywood2 dataset Marszałek et al. (2009). |
| Dataset Splits | Yes | The number of samples for training is 50, 000 and for testing is 10, 000. We classify the resulting representation using logistic regression with weight decay for classification, with weight cost parameter estimated using cross-validation on a subset of the training samples of size 10000. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions that an implementation is available in Python but does not specify version numbers for Python or any other software libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | For all the experiments in this section we chose a learning rate of 0.0001 for a few (e.g. 3) initial training epochs, and then increased it to 0.001. This is to ensure that scaling issues in the initializing are dealt with at the outset, and to help avoid any blow-ups during training. Each model is trained for 1000 epochs in total with a fixed momentum of 0.9. The threshold parameter θ is fixed to 1.0 for both the TRec and TLin autoencoder. |