reproducibilityindex.ai

Riemannian approach to batch normalization

Authors: Minhyung Cho, Jaehyung Lee

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated the proposed learning algorithm for image classiﬁcation tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17]. We used the VGG network [18] and wide residual network [2, 19, 20] for experiments.
Researcher Affiliation	Collaboration	Minhyung Cho Jaehyung Lee Applied Research Korea, Gracenote Inc. mhyung.cho@gmail.com jaehyung.lee@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 Gradient descent of a function f on an abstract Riemannian manifold M; Algorithm 2 Stochastic gradient descent with momentum on G(1, n); Algorithm 3 Adam on G(1, n); Algorithm 4 Batch normalization on product manifolds of G(1, )
Open Source Code	Yes	Source code is publicly available at https://github.com/Minhyung Cho/riemannian-batch-normalization.
Open Datasets	Yes	We evaluated the proposed learning algorithm for image classiﬁcation tasks using three benchmark datasets: CIFAR-10 [16], CIFAR-100 [16], and SVHN (Street View House Number) [17].
Dataset Splits	No	The paper specifies training and test set sizes for CIFAR-10 (50,000 training, 10,000 test) and SVHN (73,257 + 531,131 training, 26,032 test) but does not explicitly define a separate validation dataset split.
Hardware Specification	No	The paper states, 'This model was trained on two GPUs.' (footnote to Table 1) but does not provide specific details about the GPU models, CPU, or other hardware components used for the experiments.
Software Dependencies	No	The paper refers to algorithms and optimizers (e.g., SGD with Nesterov momentum, Adam) but does not specify any software dependencies with version numbers, such as programming languages, deep learning frameworks, or libraries.
Experiment Setup	Yes	For the baseline, the networks were trained by SGD with Nesterov momentum [22]. The weight decay was set to 0.0005, momentum to 0.9, and minibatch size to 128. For CIFAR experiments, the initial learning rate was set to 0.1 and multiplied by 0.2 at 60, 120, and 160 epochs. It was trained for a total of 200 epochs. For SVHN, the initial learning rate was set to 0.01 and multiplied by 0.1 at 60 and 120 epochs. It was trained for a total of 160 epochs. ... The selected initial learning rates were ηe = 0.01, ηg = 0.2 for Algorithm 2 and ηe = 0.01, ηg = 0.05 for Algorithm 3. ... The threshold for clipping the gradient ν was set to 0.1. The regularization strength α in Eq. (15) was set to 0.1