Why Regularized Auto-Encoders learn Sparse Representation?
Authors: Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on our theoretical analysis, we also empirically study multiple popular AE models and activation functions in order to analyze their comparative behaviour in terms of sparsity in the learned representations. Our analysis thus shows why various AE models and activations lead to sparsity. As a result, they are uniļ¬ed under a framework uncovering the fundamental properties of regularizations and activation functions that most of these existing models possess. |
| Researcher Affiliation | Academia | Devansh Arpit DEVANSHA@BUFFALO.EDU Yingbo Zhou YINGBOZH@BUFFALO.EDU Hung Q. Ngo HUNGNGO@BUFFALO.EDU Venu Govindaraju GOVIND@BUFFALO.EDU SUNY Buffalo |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides mathematical derivations and proofs but no procedural algorithms in a structured format. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We use the following two datasets for our experiments: 1. MNIST (Lecun & Cortes): It is a 10 class dataset of handwritten digit images of which 50, 000 images are provided for training. 2. CIFAR-10 (Krizhevsky, 2009): It consists of 60,000 32x32 color images of objects in 10 classes. For CIFAR-10, we randomly crop 50, 000 patches of size 8x8 for training the auto-encoders. |
| Dataset Splits | No | The paper mentions using 50,000 images for training for both MNIST and CIFAR-10 but does not specify any validation or test splits. It does not provide percentages, absolute counts for validation, or reference predefined splits for reproducibility beyond the training data. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only describes the software setup and experimental protocols. |
| Software Dependencies | No | The paper states using "mini-batch stochastic gradient descent with momentum (0.9)" but does not specify the software libraries, frameworks, or their version numbers (e.g., TensorFlow, PyTorch, scikit-learn, with specific versions) used for implementation. |
| Experiment Setup | Yes | For all experiments, we use mini-batch stochastic gradient descent with momentum (0.9) for optimization, 50 epochs, batch size 50 and hidden units 1000. ... Learning Rate (LR): ... choose LR in the range (0.001, 0.005) for our experiments. |