Learning Deep Architectures via Generalized Whitened Neural Networks

Authors: Ping Luo

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on various datasets demonstrate the benefits of GWNN. We compare WNN, pre-GWNN, and post-GWNN in the following aspects, including a) number of iterations when training converged, b) computation times for training, and c) generalization capacities on various datasets. We also conduct ablation studies...
Researcher Affiliation Academia 1Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 2Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong.
Pseudocode Yes Algorithm 1 Training WNN. Algorithm 2 Training post-GWNN.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We employ the following datasets. a) MNIST (Lecun et al., 1998) has 60, 000 28 28 images of 10 handwritten digits (0-9)... b) CIFAR-10 (Krizhevsky, 2009) consists of 50, 000 32 32 color images... c) CIFAR-100 (Krizhevsky, 2009) has the same number of images as CIFAR-10... d) SVHN (Netzer et al., 2011) consists of color images of house numbers collected by Google Street View.
Dataset Splits Yes 5, 000 images from the training set are randomly selected as a validation set. For CIFAR-10, 5, 000 images are chosen for validation. We follow (Sermanet et al., 2012) to build a validation set by selecting 400 samples per class from the training set and 200 samples per class from the additional set.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only implies general computing resources without specifying their configurations.
Software Dependencies No The paper does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). It only mentions general concepts like 'SGD' but not the software stack used.
Experiment Setup Yes The search specifications of minibatch size, learning rate, and whitening interval τ are {64, 128, 256}, {0.1, 0.01, 0.001}, and {20, 50, 100, 103}, respectively. In particular, for WNN and pre-GWNN, the number of samples used to estimate the covariance matrix, N, is picked up from {103, 104 /2 , 104}. For post-GWNN, N is chosen to be the same as the minibatch size and the decay period k = 0.1τ. For two CIFAR datasets, we adopt minibatch size 64 and initial learning rate 0.1, which is reduced by half after every 25 epochs. We train for 250 epochs. As SVHN is a large dataset, we train for 100 epochs with minibatch size 128 and initial learning rate 0.05, which is reduced by half after every 10 epochs.