CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
Authors: Zhize Li, Peter Richtarik
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experiments In this section, we demonstrate the performance of our accelerated method CANITA (Algorithm 1) and previous methods QSGD and DIANA (the theoretical convergence results of these algorithms can be found in Table 1) with different compression operators on the logistic regression problem, min x Rd f(x) := 1 n Pn i=1 log 1 + exp( bia T i x) , (14) where {ai, bi}n i=1 Rd { 1} are data samples. We use three standard datasets: a9a, mushrooms, and w8a in the experiments. All datasets are downloaded from LIBSVM [4]. In Figures 1 3, we compare our CANITA with QSGD and DIANA with three compression operators: random sparsification (left), natural compression (middle), and random quantization (right) on three datasets: a9a (Figure 1), mushrooms (Figure 2), and w8a (Figure 3). The x-axis and y-axis represent the number of communication bits and the training loss, respectively. |
| Researcher Affiliation | Academia | Zhize Li KAUST zhize.li@kaust.edu.sa Peter Richtárik KAUST peter.richtarik@kaust.edu.sa |
| Pseudocode | Yes | Algorithm 1 Distributed compressed accelerated ANITA method (CANITA) |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use three standard datasets: a9a, mushrooms, and w8a in the experiments. All datasets are downloaded from LIBSVM [4]. |
| Dataset Splits | No | The paper mentions using 'three standard datasets: a9a, mushrooms, and w8a' but does not specify any explicit training, validation, or test split percentages or sample counts for these datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using LIBSVM but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | In our experiments, we directly use the theoretical stepsizes and parameters for all three algorithms: QSGD [1, 24], DIANA [12], our CANITA (Algorithm 1). To compare with the settings of DIANA and CANITA, we use local gradients (not stochastic gradients) in QSGD. |