Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices
Authors: Kilian Pfeiffer, Ramin Khalili, Joerg Henkel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show through extensive experimental evaluation that our technique greatly improves the accuracy of the trained model (by 52.4 p.p.) compared with the state of the art, efficiently aggregating the computation capacity available on distributed devices. |
| Researcher Affiliation | Collaboration | Kilian Pfeiffer Karlsruhe Institute of Technology Karlsruhe, Germany kilian.pfeiffer@kit.edu Ramin Khalili Huawei Research Center Munich Munich, Germany ramin.khalili@huawei.com Jörg Henkel Karlsruhe Institute of Technology Karlsruhe, Germany henkel@kit.edu |
| Pseudocode | Yes | Algorithm 1: Successive Layer Training: w and W label the set of all layers parameters. |
| Open Source Code | Yes | The source code of SLT is available at https://github.com/k1l1/SLT. |
| Open Datasets | Yes | We evaluate SLT in an FL setting using Py Torch [21], where we distribute a share from the datasets CIFAR10, CIFAR100 [22], FEMNIST from the Leaf [23] benchmark, and Tiny Image Net [24] to each device c C, s.t. each device c has a local dataset Dc of the same size. |
| Dataset Splits | No | The paper mentions evaluating using the full server model and monitoring 'test accuracy on the server' for early stopping, implying a test/validation set. However, it does not explicitly state the specific training/validation/test dataset splits (e.g., percentages or sample counts) from the original datasets used to reproduce the data partitioning. |
| Hardware Specification | Yes | We use batch size 32 and perform 363 experiments in total, with an average run-time of 6 h on an NVIDIA Tesla V100. |
| Software Dependencies | No | The paper mentions using PyTorch for the experiments, but it does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We train with the optimizer stochastic gradient descent (SGD) with momentum of 0.9, an initial learning rate of η = 0.1, and apply cosine annealing to η = 0.01 and a weight decay of 1.0 10 5. We use batch size 32 and perform 363 experiments in total, with an average run-time of 6 h on an NVIDIA Tesla V100. |