Communication-Efficient Stochastic Gradient Descent Ascent with Momentum Algorithms

Authors: Yihan Zhang, Meikang Qiu, Hongchang Gao

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we apply our algorithm to the distributed AUC maximization problem for the imbalanced data classification task. Extensive experimental results confirm the efficacy of our algorithm in saving communication cost. We conducted extensive experiments on the imbalanced classification task, which confirms the effectiveness of our algorithms. In Figure 1, we report the testing AUC score versus the number of epochs on testing sets.
Researcher Affiliation Academia 1Temple University, Philadelphia, PA, USA 2Dakota State University, Madison, SD, USA
Pseudocode Yes Algorithm 1 SGDAM-PEF ... Algorithm 2 SGDAM-REF
Open Source Code No The paper does not contain any explicit statements about releasing source code or links to code repositories.
Open Datasets Yes Datasets. In our experiments, five benchmark datasets are employed to evaluate the performance of our algorithm. They are CATvs DOG 1, CIFAR10, CIFAR100 2, STL10 [Coates et al., 2011], Melanoma [Rotemberg et al., 2021]. ... 1https://www.kaggle.com/c/dogs-vs-cats 2https://www.cs.toronto.edu/ kriz/cifar.html
Dataset Splits No The training set is randomly distributed to all workers, while the testing set are the same for all workers. The paper mentions train and test sets but does not specify a validation set or explicit train/validation/test split percentages.
Hardware Specification Yes Here we use four workers where each worker is a V100-GPU.
Software Dependencies No The paper mentions employing a quantization operator but does not provide specific software names with version numbers for reproducibility (e.g., Python 3.x, PyTorch 1.x, CUDA 11.x).
Experiment Setup Yes Input: η > 0, γ > 0, λ > 0, ρ1 > 0, ρ2 > 0, r0 = 0, s0 = 0. The compression operator in our experiment include Top-k and Rand-k where k = 20%. ... the quantization level is set to 4. ... we use the equivalent learning rate for all algorithms, i.e., 0.1.