Learning Deep Architectures via Generalized Whitened Neural Networks
Authors: Ping Luo
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various datasets demonstrate the benefits of GWNN. We compare WNN, pre-GWNN, and post-GWNN in the following aspects, including a) number of iterations when training converged, b) computation times for training, and c) generalization capacities on various datasets. We also conduct ablation studies... |
| Researcher Affiliation | Academia | 1Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 2Multimedia Laboratory, The Chinese University of Hong Kong, Hong Kong. |
| Pseudocode | Yes | Algorithm 1 Training WNN. Algorithm 2 Training post-GWNN. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We employ the following datasets. a) MNIST (Lecun et al., 1998) has 60, 000 28 28 images of 10 handwritten digits (0-9)... b) CIFAR-10 (Krizhevsky, 2009) consists of 50, 000 32 32 color images... c) CIFAR-100 (Krizhevsky, 2009) has the same number of images as CIFAR-10... d) SVHN (Netzer et al., 2011) consists of color images of house numbers collected by Google Street View. |
| Dataset Splits | Yes | 5, 000 images from the training set are randomly selected as a validation set. For CIFAR-10, 5, 000 images are chosen for validation. We follow (Sermanet et al., 2012) to build a validation set by selecting 400 samples per class from the training set and 200 samples per class from the additional set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only implies general computing resources without specifying their configurations. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). It only mentions general concepts like 'SGD' but not the software stack used. |
| Experiment Setup | Yes | The search specifications of minibatch size, learning rate, and whitening interval τ are {64, 128, 256}, {0.1, 0.01, 0.001}, and {20, 50, 100, 103}, respectively. In particular, for WNN and pre-GWNN, the number of samples used to estimate the covariance matrix, N, is picked up from {103, 104 /2 , 104}. For post-GWNN, N is chosen to be the same as the minibatch size and the decay period k = 0.1τ. For two CIFAR datasets, we adopt minibatch size 64 and initial learning rate 0.1, which is reduced by half after every 25 epochs. We train for 250 epochs. As SVHN is a large dataset, we train for 100 epochs with minibatch size 128 and initial learning rate 0.05, which is reduced by half after every 10 epochs. |