Towards Tight Communication Lower Bounds for Distributed Optimisation

Authors: Janne H. Korhonen, Dan Alistarh

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the N machines to solve this problem under point-to-point communication, within a given error-tolerance. Specifically, we show that (Nd log d/N") total bits need to be communicated between the machines to find an additive -approximation to the minimum of PN i=1 fi(x). The result holds for both deterministic and randomised algorithms, and, importantly, requires no assumptions on the algorithm structure.
Researcher Affiliation Collaboration Dan Alistarh IST Austria & Neural Magic, Inc. dan.alistarh@ist.ac.at Janne H. Korhonen IST Austria janne.korhonen@ist.ac.at
Pseudocode Yes The algorithm proceeds in rounds t = 1, 2, . . . , T. At the beginning of round t + 1, each node i knows the values of the iterate x(t), the global quantisation estimate q(t), and its local quantisation estimate q(t) i for i = 1, 2, . . . , N. At step t, nodes perform the following steps: (1) Each node i updates its iterate as x(t+1) = x(t) γq(t). (2) Each node i computes its local gradient over x(t+1), and transmits it in quantised form to the coordinator as follows. Let "1 = δR(t+1)/2N and 1 = R(t+1)/N. (a) Node i computes rfi(x(t+1)) locally, and sends message mi = enc"1, 1(rfi(x(t+1))) to the coordinator. (b) The coordinator receives messages mi for i = 1, 2, . . . , N, and decodes them as q(t+1) i = dec"1, 1(q(t) i , mi). The coordinator then computes r(t+1) = PN i . (3) The coordinator sends the quantised sum of gradients to all other nodes as follows. Let "2 = δR(t+1)/2 and 2 = 2R(t+1). (a) The coordinator sends the message m = enc"2, 2(r(t+1)) to each node i. (b) Each node decodes the coordinator s message as q(t+1) = dec"2, 2(q(t), m).
Open Source Code No The paper does not contain any explicit statements or links indicating the release of open-source code for the described methodology.
Open Datasets No The paper discusses mathematical bounds and theoretical algorithms for optimizing quadratic functions (x β0kx x0k2 2), not the use of specific public datasets for training. Thus, no information on dataset availability is provided.
Dataset Splits No The paper discusses theoretical bounds and algorithms. It does not mention using specific datasets or providing training/test/validation splits, as it does not conduct empirical experiments.
Hardware Specification No The paper is theoretical and focuses on communication complexity, algorithms, and proofs. It does not describe any empirical experiments or the hardware used to run them.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not discuss specific software dependencies or their version numbers for implementation.
Experiment Setup No The paper focuses on theoretical contributions (lower bounds and algorithm design with analysis) rather than empirical experiments. Therefore, no specific experimental setup details such as hyperparameters or training configurations are provided.