Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence
Authors: Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted experiments with various topologies, demonstrating that the BASE-(k + 1) GRAPH enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies. |
| Researcher Affiliation | Collaboration | Yuki Takezawa1,2 , Ryoma Sato1,2 , Han Bao1,2, Kenta Niwa3, Makoto Yamada2 1Kyoto University, 2OIST, 3NTT Communication Science Laboratories |
| Pseudocode | Yes | Algorithm 1 k-PEER HYPER-HYPERCUBE GRAPH Hk(V ), Algorithm 2 SIMPLE BASE-(k + 1) GRAPH Asimple k (V ), Algorithm 3 BASE-(k + 1) GRAPH Ak(V ) |
| Open Source Code | Yes | Our code is available at https://github.com/yuki Takezawa/Base Graph. |
| Open Datasets | Yes | We used three datasets, Fashion MNIST [41], CIFAR-{10, 100} [14], and used Le Net [15] for Fashion MNIST and VGG-11 [32] for CIFAR-{10, 100}. |
| Dataset Splits | No | The paper mentions tuning the learning rate by grid search but does not specify validation dataset splits (e.g., percentage or sample count). |
| Hardware Specification | Yes | We ran all experiments on a server with eight Nvidia RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for it or any other software components. |
| Experiment Setup | Yes | The learning rate was tuned by the grid search and we used the cosine learning rate scheduler [22]. We distributed the training dataset to nodes by using Dirichlet distributions with hyperparameter α [7], conducting experiments in both homogeneous and heterogeneous data distribution settings. As α approaches zero, the data distributions held by each node become more heterogeneous. We repeated all experiments with three different seed values and reported their averages. See Sec. H for more detailed settings. Tables 3 and 4 list the detailed hyperparameter settings used in Secs. 6 and F.3. |