Winner-Take-All Autoencoders

Authors: Alireza Makhzani, Brendan J. Frey

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We will show that winner-take-all autoencoders can be used to to learn deep sparse representations from the MNIST, CIFAR-10, Image Net, Street View House Numbers and Toronto Face datasets, and achieve competitive classification performance.
Researcher Affiliation Academia Alireza Makhzani, Brendan Frey University of Toronto makhzani, frey@psi.toronto.edu
Pseudocode No The paper describes its methods in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes An IPython demo for reproducing important results of this paper is publicly available at http://www.comm.utoronto.ca/ makhzani/
Open Datasets Yes We will show that winner-take-all autoencoders can be used to to learn deep sparse representations from the MNIST, CIFAR-10, Image Net, Street View House Numbers and Toronto Face datasets, and achieve competitive classification performance. The MNIST dataset has 60K training points and 10K test points. The SVHN dataset has about 600K training points and 26K test points.
Dataset Splits No The paper specifies training and test set sizes (e.g., 60K training points and 10K test points for MNIST) but does not explicitly mention a separate validation split or how such a split was handled for hyperparameter tuning or early stopping.
Hardware Specification No We also acknowledge the support of NVIDIA with the donation of the GPUs used for this research. While GPUs are mentioned, no specific models (e.g., NVIDIA A100, RTX 3090) or other hardware details (CPU, RAM) are provided for reproducibility.
Software Dependencies No The paper mentions an "IPython demo" and the use of SVMs, but it does not specify any software names with version numbers (e.g., Python 3.x, TensorFlow x.x, PyTorch x.x, scikit-learn x.x).
Experiment Setup Yes FC-WTA autoencoders can aim for any target sparsity rate... train very fast... have no hyper-parameter to be tuned (except the target sparsity rate). The CONV-WTA autoencoder... encoder typically consists of a stack of several Re LU convolutional layers (e.g., 5 5 filters) and the decoder is a linear deconvolutional layer of larger size (e.g., 11 11 filters). Fig. 4a depicts the filters of a convolutional autoencoder with 16 maps, 20% input and 50% hidden unit dropout trained on Street View House Numbers dataset.