reproducibilityindex.ai

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

Authors: Jun Li, Fuxin Li, Sinisa Todorovic

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN.
Researcher Affiliation	Academia	Jun Li, Li Fuxin, Sinisa Todorovic School of EECS Oregon State University Corvallis, OR 97331 {liju2,lif,sinisa}@oregonstate.edu
Pseudocode	Yes	Algorithm 1 Cayley SGD with Momentum; Algorithm 2 Cayley ADAM
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	Datasets: We evaluate Cayley SGD or Cayley ADAM in image classiﬁcation on the CIFAR10 and CIFAR100 datasets (Krizhevsky & Hinton, 2009). CIFAR10 and CIFAR100 consist of of 50,000 training images and 10,000 test images, and have 10 and 100 mutually exclusive classes.
Dataset Splits	Yes	Pixel-by-pixel MNIST: ...we select 5,000 out of the 60,000 training examples for the early stopping validation.
Hardware Specification	Yes	All algorithms are run on one TITAN Xp GPU.
Software Dependencies	No	The paper does not specify software versions for its implementation (e.g., Python, PyTorch, TensorFlow versions are not listed).
Experiment Setup	Yes	Training Strategies: We use different learning rates le and lst for weights on the Euclidean space and the Stiefel manifold, respectively. We set the weight decay as 0.0005, momentum as 0.9, and minibatch size as 128. The initial learning rates are set as le = 0.01 and lst = 0.2 for Cayley SGD and le = 0.01 and lst = 0.4 for Cayley ADAM. During training, we reduce the learning rates by a factor of 0.2 at 60, 120, and 160 epochs. The total number of epochs in training is 200. In training, the data samples are normalized using the mean and variance of the training set, and augmented by randomly ﬂipping training images.