Regularizing by the Variance of the Activations' Sample-Variances

Authors: Etai Littwin, Lior Wolf

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.
Researcher Affiliation Collaboration 1Tel Aviv University 2Facebook AI Research
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code No To support reproducibility, the entire code of all of our experiments is to be promptly released.
Open Datasets Yes The two CIFAR datasets (Krizhevsky Hinton, 2009) consist of colored natural images sized at 32 32 pixels. The Tiny Image Net dataset consists of a subset of Image Net [16]. UCI We also apply VCL to the 44 UCI datasets with more than 1000 samples. The train/test splits were provided by the authors of [11].
Dataset Splits Yes For each dataset, there are 50,000 training images and 10,000 images reserved for testing. The Tiny Image Net dataset consists of a subset of Image Net [16], with 200 different classes, each of which has 500 training images and 50 validation images, downscaled to 64 64. The train/test splits were provided by the authors of [11].
Hardware Specification Yes Table 1: Time in Seconds per 100 iterations (CIFAR-100). Method Intel i7 CPU Volta GPU
Software Dependencies No The paper does not specify version numbers for any software dependencies.
Experiment Setup Yes For all experiments, 500 epochs are used and a batch size N of 250. We employ a learning rate of 0.05, which was reduced at epoch 180 to 0.02, and further reduced by a factor of 10 every 100 epochs. A momentum of 0.9 was used and the L2 regularization term was weighed by 0.0001. The hyperparameters of VCL are fixed: the weight of the VCL regularization is set to γ = 0.01.