Decentralized Riemannian Algorithm for Nonconvex Minimax Problems
Authors: Xidong Wu, Zhengmian Hu, Heng Huang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results on the Deep Neural Networks (DNNs) training over the Stiefel manifold demonstrate the efficiency of our algorithms. Numerical Experiments We conducted numerical experiments to validate the efficiency of our algorithms on two tasks: 1) Orthonormal fair classification networks and 2) distributionally robust optimization with orthonormal weights. |
| Researcher Affiliation | Academia | Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, United States xidong wu@outlook.com, huzhengmian@gmail.com, henghuanghh@gmail.com |
| Pseudocode | Yes | Algorithm 1: DRGDA Algorithm ... Algorithm 2: DRSGDA Algorithm |
| Open Source Code | No | The paper does not provide a direct link to open-source code or explicitly state that the code for their methods is publicly released. |
| Open Datasets | Yes | In the experiment, we use the MNIST, Fashion-MNIST, and CIFAR-10 datasets as in (Huang, Wu, and Huang 2021). ... In this task, we also use the datasets MNIST, Fashion MNIST, and CIFAR-10 datasets with the same DNN architecture provided in the supplementary materials. |
| Dataset Splits | No | The paper mentions training, but does not provide specific details on how datasets are split for training, validation, or testing (e.g., percentages, counts, or a standard split citation that includes validation details). It mentions 'datasets are evenly divided into disjoint sets across all worker nodes', but this is about distribution, not train/val/test splits. |
| Hardware Specification | Yes | The experiments are conducted using computers with 2.3 GHz Intel Core i9 CPUs and NVIDIA Tesla P40 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of the algorithms or experiments. |
| Experiment Setup | Yes | The grid search is used to tune parameters for all methods. For all datasets, we choose the {α, β, η} from the set {0.0001, 0.001, 0.005, 0.01} for DRGDA and DRSGDA. For other methods, we tune the learning rates from the set {0.0001, 0.001, 0.005, 0.01}. For DM-HSGD, we also set {βx, βy} from the set {0.1, 0.9}. The batch sizes for MNIST and Fashion-MNIST are 100 while that for CIFAT10 is 50. The initial batch sizes for GT-SRVR and DMHSGD are set as 300. ... The batch size for all methods is set as 100. The initial batch size for GT-SRVR and DM-HSGD is set as 300. |