k-Sparse Autoencoders

Authors: Alireza Makhzani; Brendan Frey

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of ksparse autoencoders in both unsupervised learning and in shallow and deep discriminative learning tasks. We use the MNIST handwritten digit dataset, which consists of 60,000 training images and 10,000 test images. We randomly separate the training set into 50,000 training cases and 10,000 cases for validation.
Researcher Affiliation Academia Alireza Makhzani makhzani@psi.utoronto.ca Brendan Frey frey@psi.utoronto.ca University of Toronto, 10 King s College Rd. Toronto, Ontario M5S 3G4, Canada
Pseudocode Yes k-Sparse Autoencoders: 1) Perform the feedforward phase and compute z = W x + b 2) Find the k largest activations of z and set the rest to zero. z(Γ)c = 0 where Γ = suppk(z) 3) Compute the output and the error using the sparsified z. ˆx = Wz + b E = x ˆx 2 2 3) Backpropagate the error through the k largest activations defined by Γ and iterate. Sparse Encoding: Compute the features h = W x + b. Find its αk largest activations and set the rest to zero. h(Γ)c = 0 where Γ = suppαk(h)
Open Source Code No The paper mentions using 'the publicly available gnumpy library (Tieleman, 2010)' but does not state that their own implementation code for the k-sparse autoencoder is open-source or available.
Open Datasets Yes We use the MNIST handwritten digit dataset, which consists of 60,000 training images and 10,000 test images. We also use the small NORB normalized-uniform dataset (Le Cun et al., 2004), which contains 24,300 training examples and 24,300 test examples. We also test our method on natural image patches extracted from CIFAR-10 dataset.
Dataset Splits Yes We randomly separate the training set into 50,000 training cases and 10,000 cases for validation. The training set is separated into 20,000 for training and 4,300 for validation.
Hardware Specification Yes We used an efficient GPU implementation obtained using the publicly available gnumpy library (Tieleman, 2010) on a single Nvidia GTX 680 GPU.
Software Dependencies No The paper mentions 'publicly available gnumpy library (Tieleman, 2010)' but does not provide a specific version number for gnumpy itself.
Experiment Setup Yes We optimized the model parameters using stochastic gradient descent with momentum as follows. vk+1 = mkvk ηk f(xk) xk+1 = xk + vk (11) Here, vk is the velocity vector, mk is the momentum and ηk is the learning rate at the k-th iteration. We also use a Gaussian distribution with a standard deviation of σ for initialization of the weights. We use different momentum values, learning rates and initializations based on the task and the dataset, and validation is used to select hyperparameters. In the unsupervised MNIST task, the values were σ = 0.01 , mk = 0.9 and ηk = 0.01, for 5000 epochs. In the supervised MNIST task, training started with mk = 0.25 and ηk = 1, and then the learning rate was linearly decreased to 0.001 over 200 epochs. In the unsupervised NORB task, the values were σ = 0.01, mk = 0.9 and ηk = 0.0001, for 5000 epochs. In the supervised NORB task, training started with mk = 0.9 and ηk = 0.01, and then the learning rate was linearly decreased to 0.001 over 200 epochs.