Riemannian approach to batch normalization

Authors: Minhyung Cho, Jaehyung Lee

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the proposed learning algorithm for image classification tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17]. We used the VGG network [18] and wide residual network [2, 19, 20] for experiments.
Researcher Affiliation Collaboration Minhyung Cho Jaehyung Lee Applied Research Korea, Gracenote Inc. mhyung.cho@gmail.com jaehyung.lee@kaist.ac.kr
Pseudocode Yes Algorithm 1 Gradient descent of a function f on an abstract Riemannian manifold M; Algorithm 2 Stochastic gradient descent with momentum on G(1, n); Algorithm 3 Adam on G(1, n); Algorithm 4 Batch normalization on product manifolds of G(1, )
Open Source Code Yes Source code is publicly available at https://github.com/Minhyung Cho/riemannian-batch-normalization.
Open Datasets Yes We evaluated the proposed learning algorithm for image classification tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17].
Dataset Splits No The paper specifies training and test set sizes for CIFAR-10 (50,000 training, 10,000 test) and SVHN (73,257 + 531,131 training, 26,032 test) but does not explicitly define a separate validation dataset split.
Hardware Specification No The paper states, 'This model was trained on two GPUs.' (footnote to Table 1) but does not provide specific details about the GPU models, CPU, or other hardware components used for the experiments.
Software Dependencies No The paper refers to algorithms and optimizers (e.g., SGD with Nesterov momentum, Adam) but does not specify any software dependencies with version numbers, such as programming languages, deep learning frameworks, or libraries.
Experiment Setup Yes For the baseline, the networks were trained by SGD with Nesterov momentum [22]. The weight decay was set to 0.0005, momentum to 0.9, and minibatch size to 128. For CIFAR experiments, the initial learning rate was set to 0.1 and multiplied by 0.2 at 60, 120, and 160 epochs. It was trained for a total of 200 epochs. For SVHN, the initial learning rate was set to 0.01 and multiplied by 0.1 at 60 and 120 epochs. It was trained for a total of 160 epochs. ... The selected initial learning rates were ηe = 0.01, ηg = 0.2 for Algorithm 2 and ηe = 0.01, ηg = 0.05 for Algorithm 3. ... The threshold for clipping the gradient ν was set to 0.1. The regularization strength α in Eq. (15) was set to 0.1