reproducibilityindex.ai

Sketching for Distributed Deep Learning: A Sharper Analysis

Authors: Mayank Shrivastava, Berivan Isik, Qiaobo Li, Sanmi Koyejo, Arindam Banerjee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results both on the loss Hessian and overall accuracy of sketch-DL supporting our theoretical results. Taken together, our results provide theoretical justification for the observed empirical success of sketch-DL. In this section, we provide a comparison of the sketching approach in Algorithm 1 with other common approaches such as local Top-r [44] and Fetch SGD [59]. ... We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients. Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round. Figure 1 shows that Count-Sketch-based distributed learning approach in Algorithm 1 performs competitively with Fetch SGD.
Researcher Affiliation	Collaboration	Mayank Shrivastava University of Illinois Urbana-Champaign mayanks4@illinois.edu Berivan Isik Google berivan@google.com Qiaobo Li University of Illinois Urbana-Champaign qiaobol2@illinois.edu Sanmi Koyejo Stanford University sanmi@cs.stanford.edu Arindam Banerjee University of Illinois Urbana-Champaign arindamb@illinois.edu
Pseudocode	Yes	Algorithm 1 Sketching-Based Distributed Learning. Hyperparameters: server learning rate ηglobal, local learning rate ηlocal. Inputs: local datasets Dc of size nc for clients c = 1, . . . , C, number of communication rounds T. Output: final model θT .
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Due to confidentiality constraints, we are unable to share the code at this time, but we provide sufficient details to reproduce the results.
Open Datasets	Yes	We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients.
Dataset Splits	No	The paper states 'We train Res Net-18 [28] on CIFAR-10 dataset [38]...' but does not explicitly provide the training/validation/test dataset splits or percentages. While CIFAR-10 has standard splits, the paper does not specify how it utilized them for reproduction, only mentioning 'Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round.'
Hardware Specification	Yes	We conducted our experiments on NVIDIA Titan X GPUs on an internal cluster server, using 1 GPU per one run.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies used in its experiments, such as programming languages or deep learning frameworks (e.g., Python, PyTorch, TensorFlow versions). While it mentions 'Py Hessian' in Appendix G, it does not specify a version or list other software with versions for the main experimental setup.
Experiment Setup	Yes	We train Res Net-18 [28] on CIFAR-10 dataset [38] that is i.i.d. distributed to 100 clients. Each client performs 5 local gradient descent iterations (i.e., using full-batch of size 500) at every round. ... We use a learning rate of 1e 3, SGD as the optimizer and and perform GD.