Sketching for Distributed Deep Learning: A Sharper Analysis
Authors: Mayank Shrivastava, Berivan Isik, Qiaobo Li, Sanmi Koyejo, Arindam Banerjee
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present empirical results both on the loss Hessian and overall accuracy of sketch-DL supporting our theoretical results. Taken together, our results provide theoretical justification for the observed empirical success of sketch-DL. In this section, we provide a comparison of the sketching approach in Algorithm 1 with other common approaches such as local Top-r [44] and Fetch SGD [59]. ... We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients. Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round. Figure 1 shows that Count-Sketch-based distributed learning approach in Algorithm 1 performs competitively with Fetch SGD. |
| Researcher Affiliation | Collaboration | Mayank Shrivastava University of Illinois Urbana-Champaign mayanks4@illinois.edu Berivan Isik Google berivan@google.com Qiaobo Li University of Illinois Urbana-Champaign qiaobol2@illinois.edu Sanmi Koyejo Stanford University sanmi@cs.stanford.edu Arindam Banerjee University of Illinois Urbana-Champaign arindamb@illinois.edu |
| Pseudocode | Yes | Algorithm 1 Sketching-Based Distributed Learning. Hyperparameters: server learning rate ηglobal, local learning rate ηlocal. Inputs: local datasets Dc of size nc for clients c = 1, . . . , C, number of communication rounds T. Output: final model θT . |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Due to confidentiality constraints, we are unable to share the code at this time, but we provide sufficient details to reproduce the results. |
| Open Datasets | Yes | We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients. |
| Dataset Splits | No | The paper states 'We train Res Net-18 [28] on CIFAR-10 dataset [38]...' but does not explicitly provide the training/validation/test dataset splits or percentages. While CIFAR-10 has standard splits, the paper does not specify how it utilized them for reproduction, only mentioning 'Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round.' |
| Hardware Specification | Yes | We conducted our experiments on NVIDIA Titan X GPUs on an internal cluster server, using 1 GPU per one run. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies used in its experiments, such as programming languages or deep learning frameworks (e.g., Python, PyTorch, TensorFlow versions). While it mentions 'Py Hessian' in Appendix G, it does not specify a version or list other software with versions for the main experimental setup. |
| Experiment Setup | Yes | We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients. Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round. ... We use a learning rate of 1e 3, SGD as the optimizer and and perform GD. |