A Flexible Framework for Communication-Efficient Machine Learning
Authors: Sarit Khirirat, Sindri Magnússon, Arda Aytekin, Mikael Johansson8101-8109
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretical results and practical experiments indicate that the automatic tuning strategies significantly increase communication efficiency on several state-of-the-art compression schemes. We evaluate the performance of our CAT framework for dynamic sparsification and quantization (S+Q) in the single-master, single-worker setup on the URL data set with 2.4 million data points and 3.2 million features. We compare the results to gradient descent and Alistarh s S+Q (Alistarh et al. 2017). We implement all algorithms in Julia, and run them on 4 nodes using MPI, splitting the data evenly between the nodes. The right-most plot in Figure 3 shows that our CAT S+Q outperforms all other compression schemes. |
| Researcher Affiliation | Collaboration | Sarit Khirirat 1, Sindri Magn usson 2, Arda Aytekin 3, Mikael Johansson 1 1 Division of Decision and Control Systems, KTH Royal Institute of Technology, Sweden; 2 Department of Computer and System Science, Stockholm University, Sweden; 3 Ericsson, Sweden |
| Pseudocode | Yes | Step 1: T i = argmax T [1,d] F(x) 2/2L/C(T) Step 2: xi+1 = xi 1 LQT i( F(xi)). (9) Step 1: T i = argmax T [1,d] βi(T)/C(T) Step 2: γi= F(xi), QT i( F(xi)) 2/ T F(xi) 4 2 Step 3: xi+1=xi γi QT i( F(xi)). (12) Step 1: T i = argmax B [1,d] ωi(T)/C(T) Step 2: γi = ωi(T i) L Step 3: xi+1 = xi γi QT i( F(xi)). (15) |
| Open Source Code | No | The paper mentions using 'POLO (Aytekin, Biel, and Johansson 2018)' but does not state that the code for the specific methodology described in this paper is open-source or provide a link. |
| Open Datasets | Yes | We evaluate the performance of our CAT framework for dynamic sparsification and quantization (S+Q) in the single-master, single-worker setup on the URL data set with 2.4 million data points and 3.2 million features. We evaluate the performance of our CAT tuning rules on deterministic sparsification (SG), stochastic sparsification (SS), and sparsification with quantization (S+Q) in a multi-node setting on RCV1. |
| Dataset Splits | No | The paper mentions using the RCV1 and URL datasets but does not specify how they were split into training, validation, and test sets, nor does it refer to standard splits. |
| Hardware Specification | No | The paper describes the network setup ('1000 Mbit Internet connection', '4 nodes using MPI') but does not specify any particular hardware components like CPU or GPU models used for computation. |
| Software Dependencies | No | The paper mentions 'ZMQ library', 'C++ library POLO', and 'Julia, and run them on 4 nodes using MPI', but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We ran 30, 000 iterations and with step-size according to Lemma 3. After obtaining a linear fit to the measured communication cost (see the supplementary materials for details), we ran 30, 000 iterations and with step-size according to Lemma 3. In both cases, the floating point precision is FPP = 64. We use the packet communication model (6) with c1 = 128 bytes, c0 = 64 bytes and Pmax = 128 bytes. In all cases we use the packet communication model (6) with c1 = 576 bytes, c0 = 64 bytes and Pmax = 512 bytes. |