DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning
Authors: Hang Xu, Kelly Kostopoulou, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with real models demonstrate that Deep Reduce transmits 320% less data than existing sparsifiers, without affecting accuracy. |
| Researcher Affiliation | Academia | KAUST hang.xu@kaust.edu.cn; Kelly Kostopoulou Columbia University kelkost@cs.columbia.edu; Aritra Dutta KAUST aritra.dutta@kaust.edu.cn; Xin Li University of Central Florida xin.li@ucf.edu; Alexandros Ntoulas NKUA antoulas@di.uoa.gr; Panos Kalnis KAUST panos.kalnis@kaust.edu.sa |
| Pseudocode | Yes | We present the pseudo-code of policy P2 in Algorithm 1, Appendix B.5. |
| Open Source Code | Yes | Code is available at https://github.com/hangxu0304/Deep Reduce. |
| Open Datasets | Yes | We employ the popular Fed ML [33] benchmark that uses an LSTM model [59] to perform next-word prediction in a federated learning setting, on the Stack Overflow [67] dataset...; Benchmarks. We employ the popular Fed ML [33] benchmark that uses an LSTM model [59] to perform next-word prediction in a federated learning setting, on the Stack Overflow [67] dataset with 135,818,730 training and 16,586,035 test examples; Table 1: Benchmarks and datasets; last column shows the best quality achieved by the no-compression baseline. Type Model Task Dataset Parameters Optimizer Platform Metric Baseline CNN Res Net-20 [34] Image classif. CIFAR-10 [48] 269,722 SGD-M [73] TFlow Top-1 Acc. 90.94% Dense Net40-K12 [37] Image classif. CIFAR-10 [48] 357,491 SGD-M [73] TFlow Top-1 Acc. 91.76% Res Net-50 [34] Image classif. Image Net [17] 25,557,032 SGD-M [73] TFlow Top-1 Acc. 73.78% MLP NCF [35] Recommendation Movielens-20M [56] 31,832,577 Adam [46] Py Torch Best Hit Rate 94.97% RNN LSTM[59] Next word pred. Stack Overflow[67] 4,053,428 Fed Avg [55] Py Torch Top-1 Acc. 18.56% |
| Dataset Splits | No | The paper does not explicitly provide training, validation, and test split percentages or counts for any of the datasets used. While it mentions '135,818,730 training and 16,586,035 test examples' for Stack Overflow, it does not specify a validation set or a formal train/test/validation split methodology for other datasets. |
| Hardware Specification | Yes | Each instance is equipped with a 4-core Intel CPU @ 2.50GHz, 16GB RAM, and an NVIDIA Tesla T4 GPU with 16 GB on-board memory (see Appendix F.1 for details). We also run simulated deployments on a local cluster of 8 nodes, each with a 16-core Intel CPU @ 2.6GHz, 512GB RAM, one NVIDIA Tesla V100 GPU with 16 GB on-board memory and 100Gbps network. |
| Software Dependencies | No | The paper mentions 'Deep Reduce supports Tensor Flow and Pytorch' but does not specify their version numbers or any other software dependencies with version information. |
| Experiment Setup | Yes | Each client executes 1 local epoch; the learning rate is 0.3 and the batch size is 16. |