On Sampling Strategies for Spectral Model Sharding
Authors: Denis Korzhenkov, Christos Louizos
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that both of these methods can lead to improved performance on various commonly used datasets. |
| Researcher Affiliation | Industry | Denis Korzhenkov Qualcomm AI Research Amsterdam, The Netherlands dkorzhen@qti.qualcomm.com Christos Louizos Qualcomm AI Research Amsterdam, The Netherlands clouizos@qti.qualcomm.com Qualcomm AI Research, Qualcomm Technologies Netherlands B.V (Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.). |
| Pseudocode | Yes | Algorithm 1 Inclusion Probabilities for the Unbiased Strategy ... Algorithm 2 Inclusion Probabilities and Auxiliary Multipliers for the Collective Strategy |
| Open Source Code | No | The code cannot be released at the moment due to copyright procedures. |
| Open Datasets | Yes | For CIFAR-10 [23] we split the data with α = 1 and conduct experiments with a Res Net-18 model [15]... For Tiny Imagenet [31] we use α = 10 and a compact transformer (CCT) model [13]... For CIFAR-100 [23] the two-staged Pachinko allocation method (PAM) [26] is used... We select Shakespeare [29] as an example of a dataset with a natural data split over clients. |
| Dataset Splits | No | The paper specifies training and testing, but does not explicitly mention a validation set or validation splits for the datasets. |
| Hardware Specification | No | All our experiments were conducted with a single GPU and required not more than 10 Gb VRAM. |
| Software Dependencies | No | The paper mentions "Num Py [12] method numpy.random.choice" but does not specify version numbers for NumPy or any other software dependencies. |
| Experiment Setup | Yes | The initial value for learning rate is 0.1 for CIFAR-10, 0.05 for CIFAR-100 and Tiny Imagenet, and 0.1 for the Shakespeare data. The client s batch size equals 32, 64, 128, and 10 respectively. All experiments are run with three random seeds which also affect data splitting between clients, if applicable. ... In each communication round all participating clients train their sub-models for two local epochs. The total number of local epochs ... equals 2,000 for CIFAR-10, 3,000 for CIFAR-100, 5,000 for Tiny Imagenet and 3,000 for Shakespeare. ... we use Frobenius weight decay (FD) during local training ... The weight of FD in the resulting loss function is set to 1 10 4. Additionally ... we found it necessary to use plain SGD with momentum weight of 0.9 during local optimization of the sub-model weights. |