Riemannian approach to batch normalization
Authors: Minhyung Cho, Jaehyung Lee
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the proposed learning algorithm for image classification tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17]. We used the VGG network [18] and wide residual network [2, 19, 20] for experiments. |
| Researcher Affiliation | Collaboration | Minhyung Cho Jaehyung Lee Applied Research Korea, Gracenote Inc. mhyung.cho@gmail.com jaehyung.lee@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Gradient descent of a function f on an abstract Riemannian manifold M; Algorithm 2 Stochastic gradient descent with momentum on G(1, n); Algorithm 3 Adam on G(1, n); Algorithm 4 Batch normalization on product manifolds of G(1, ) |
| Open Source Code | Yes | Source code is publicly available at https://github.com/Minhyung Cho/riemannian-batch-normalization. |
| Open Datasets | Yes | We evaluated the proposed learning algorithm for image classification tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17]. |
| Dataset Splits | No | The paper specifies training and test set sizes for CIFAR-10 (50,000 training, 10,000 test) and SVHN (73,257 + 531,131 training, 26,032 test) but does not explicitly define a separate validation dataset split. |
| Hardware Specification | No | The paper states, 'This model was trained on two GPUs.' (footnote to Table 1) but does not provide specific details about the GPU models, CPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper refers to algorithms and optimizers (e.g., SGD with Nesterov momentum, Adam) but does not specify any software dependencies with version numbers, such as programming languages, deep learning frameworks, or libraries. |
| Experiment Setup | Yes | For the baseline, the networks were trained by SGD with Nesterov momentum [22]. The weight decay was set to 0.0005, momentum to 0.9, and minibatch size to 128. For CIFAR experiments, the initial learning rate was set to 0.1 and multiplied by 0.2 at 60, 120, and 160 epochs. It was trained for a total of 200 epochs. For SVHN, the initial learning rate was set to 0.01 and multiplied by 0.1 at 60 and 120 epochs. It was trained for a total of 160 epochs. ... The selected initial learning rates were ηe = 0.01, ηg = 0.2 for Algorithm 2 and ηe = 0.01, ηg = 0.05 for Algorithm 3. ... The threshold for clipping the gradient ν was set to 0.1. The regularization strength α in Eq. (15) was set to 0.1 |