DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning

Authors: Hang Xu, Kelly Kostopoulou, Aritra Dutta, Xin Li, Alexandros Ntoulas, Panos Kalnis

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments with real models demonstrate that Deep Reduce transmits 320% less data than existing sparsifiers, without affecting accuracy.
Researcher Affiliation Academia KAUST hang.xu@kaust.edu.cn; Kelly Kostopoulou Columbia University kelkost@cs.columbia.edu; Aritra Dutta KAUST aritra.dutta@kaust.edu.cn; Xin Li University of Central Florida xin.li@ucf.edu; Alexandros Ntoulas NKUA antoulas@di.uoa.gr; Panos Kalnis KAUST panos.kalnis@kaust.edu.sa
Pseudocode Yes We present the pseudo-code of policy P2 in Algorithm 1, Appendix B.5.
Open Source Code Yes Code is available at https://github.com/hangxu0304/Deep Reduce.
Open Datasets Yes We employ the popular Fed ML [33] benchmark that uses an LSTM model [59] to perform next-word prediction in a federated learning setting, on the Stack Overflow [67] dataset...; Benchmarks. We employ the popular Fed ML [33] benchmark that uses an LSTM model [59] to perform next-word prediction in a federated learning setting, on the Stack Overflow [67] dataset with 135,818,730 training and 16,586,035 test examples; Table 1: Benchmarks and datasets; last column shows the best quality achieved by the no-compression baseline. Type Model Task Dataset Parameters Optimizer Platform Metric Baseline CNN Res Net-20 [34] Image classif. CIFAR-10 [48] 269,722 SGD-M [73] TFlow Top-1 Acc. 90.94% Dense Net40-K12 [37] Image classif. CIFAR-10 [48] 357,491 SGD-M [73] TFlow Top-1 Acc. 91.76% Res Net-50 [34] Image classif. Image Net [17] 25,557,032 SGD-M [73] TFlow Top-1 Acc. 73.78% MLP NCF [35] Recommendation Movielens-20M [56] 31,832,577 Adam [46] Py Torch Best Hit Rate 94.97% RNN LSTM[59] Next word pred. Stack Overflow[67] 4,053,428 Fed Avg [55] Py Torch Top-1 Acc. 18.56%
Dataset Splits No The paper does not explicitly provide training, validation, and test split percentages or counts for any of the datasets used. While it mentions '135,818,730 training and 16,586,035 test examples' for Stack Overflow, it does not specify a validation set or a formal train/test/validation split methodology for other datasets.
Hardware Specification Yes Each instance is equipped with a 4-core Intel CPU @ 2.50GHz, 16GB RAM, and an NVIDIA Tesla T4 GPU with 16 GB on-board memory (see Appendix F.1 for details). We also run simulated deployments on a local cluster of 8 nodes, each with a 16-core Intel CPU @ 2.6GHz, 512GB RAM, one NVIDIA Tesla V100 GPU with 16 GB on-board memory and 100Gbps network.
Software Dependencies No The paper mentions 'Deep Reduce supports Tensor Flow and Pytorch' but does not specify their version numbers or any other software dependencies with version information.
Experiment Setup Yes Each client executes 1 local epoch; the learning rate is 0.3 and the batch size is 16.