Decentralized Riemannian Algorithm for Nonconvex Minimax Problems

Authors: Xidong Wu, Zhengmian Hu, Heng Huang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on the Deep Neural Networks (DNNs) training over the Stiefel manifold demonstrate the efficiency of our algorithms. Numerical Experiments We conducted numerical experiments to validate the efficiency of our algorithms on two tasks: 1) Orthonormal fair classification networks and 2) distributionally robust optimization with orthonormal weights.
Researcher Affiliation Academia Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, United States xidong wu@outlook.com, huzhengmian@gmail.com, henghuanghh@gmail.com
Pseudocode Yes Algorithm 1: DRGDA Algorithm ... Algorithm 2: DRSGDA Algorithm
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state that the code for their methods is publicly released.
Open Datasets Yes In the experiment, we use the MNIST, Fashion-MNIST, and CIFAR-10 datasets as in (Huang, Wu, and Huang 2021). ... In this task, we also use the datasets MNIST, Fashion MNIST, and CIFAR-10 datasets with the same DNN architecture provided in the supplementary materials.
Dataset Splits No The paper mentions training, but does not provide specific details on how datasets are split for training, validation, or testing (e.g., percentages, counts, or a standard split citation that includes validation details). It mentions 'datasets are evenly divided into disjoint sets across all worker nodes', but this is about distribution, not train/val/test splits.
Hardware Specification Yes The experiments are conducted using computers with 2.3 GHz Intel Core i9 CPUs and NVIDIA Tesla P40 GPUs.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of the algorithms or experiments.
Experiment Setup Yes The grid search is used to tune parameters for all methods. For all datasets, we choose the {α, β, η} from the set {0.0001, 0.001, 0.005, 0.01} for DRGDA and DRSGDA. For other methods, we tune the learning rates from the set {0.0001, 0.001, 0.005, 0.01}. For DM-HSGD, we also set {βx, βy} from the set {0.1, 0.9}. The batch sizes for MNIST and Fashion-MNIST are 100 while that for CIFAT10 is 50. The initial batch sizes for GT-SRVR and DMHSGD are set as 300. ... The batch size for all methods is set as 100. The initial batch size for GT-SRVR and DM-HSGD is set as 300.