A Flexible Framework for Communication-Efficient Machine Learning

Authors: Sarit Khirirat, Sindri Magnússon, Arda Aytekin, Mikael Johansson8101-8109

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes. We evaluate the performance of our CAT framework for dynamic sparsification and quantization (S+Q) in the single-master, single-worker setup on the URL data set with 2.4 million data points and 3.2 million features. We compare the results to gradient descent and Alistarh s S+Q (Alistarh et al. 2017). We implement all algorithms in Julia, and run them on 4 nodes using MPI, splitting the data evenly between the nodes. The right-most plot in Figure 3 shows that our CAT S+Q outperforms all other compression schemes.
Researcher Affiliation Collaboration Sarit Khirirat 1, Sindri Magn usson 2, Arda Aytekin 3, Mikael Johansson 1 1 Division of Decision and Control Systems, KTH Royal Institute of Technology, Sweden; 2 Department of Computer and System Science, Stockholm University, Sweden; 3 Ericsson, Sweden
Pseudocode Yes Step 1: T i = argmax T [1,d] F(x) 2/2L/C(T) Step 2: xi+1 = xi 1 LQT i( F(xi)). (9) Step 1: T i = argmax T [1,d] βi(T)/C(T) Step 2: γi= F(xi), QT i( F(xi)) 2/ T F(xi) 4 2 Step 3: xi+1=xi γi QT i( F(xi)). (12) Step 1: T i = argmax B [1,d] ωi(T)/C(T) Step 2: γi = ωi(T i) L Step 3: xi+1 = xi γi QT i( F(xi)). (15)
Open Source Code No The paper mentions using 'POLO (Aytekin, Biel, and Johansson 2018)' but does not state that the code for the specific methodology described in this paper is open-source or provide a link.
Open Datasets Yes We evaluate the performance of our CAT framework for dynamic sparsification and quantization (S+Q) in the single-master, single-worker setup on the URL data set with 2.4 million data points and 3.2 million features. We evaluate the performance of our CAT tuning rules on deterministic sparsification (SG), stochastic sparsification (SS), and sparsification with quantization (S+Q) in a multi-node setting on RCV1.
Dataset Splits No The paper mentions using the RCV1 and URL datasets but does not specify how they were split into training, validation, and test sets, nor does it refer to standard splits.
Hardware Specification No The paper describes the network setup ('1000 Mbit Internet connection', '4 nodes using MPI') but does not specify any particular hardware components like CPU or GPU models used for computation.
Software Dependencies No The paper mentions 'ZMQ library', 'C++ library POLO', and 'Julia, and run them on 4 nodes using MPI', but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes We ran 30, 000 iterations and with step-size according to Lemma 3. After obtaining a linear fit to the measured communication cost (see the supplementary materials for details), we ran 30, 000 iterations and with step-size according to Lemma 3. In both cases, the floating point precision is FPP = 64. We use the packet communication model (6) with c1 = 128 bytes, c0 = 64 bytes and Pmax = 128 bytes. In all cases we use the packet communication model (6) with c1 = 576 bytes, c0 = 64 bytes and Pmax = 512 bytes.