reproducibilityindex.ai

Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices

Authors: Kilian Pfeiffer, Ramin Khalili, Joerg Henkel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show through extensive experimental evaluation that our technique greatly improves the accuracy of the trained model (by 52.4 p.p.) compared with the state of the art, efficiently aggregating the computation capacity available on distributed devices.
Researcher Affiliation	Collaboration	Kilian Pfeiffer Karlsruhe Institute of Technology Karlsruhe, Germany kilian.pfeiffer@kit.edu Ramin Khalili Huawei Research Center Munich Munich, Germany ramin.khalili@huawei.com Jörg Henkel Karlsruhe Institute of Technology Karlsruhe, Germany henkel@kit.edu
Pseudocode	Yes	Algorithm 1: Successive Layer Training: w and W label the set of all layers parameters.
Open Source Code	Yes	The source code of SLT is available at https://github.com/k1l1/SLT.
Open Datasets	Yes	We evaluate SLT in an FL setting using Py Torch [21], where we distribute a share from the datasets CIFAR10, CIFAR100 [22], FEMNIST from the Leaf [23] benchmark, and Tiny Image Net [24] to each device c C, s.t. each device c has a local dataset Dc of the same size.
Dataset Splits	No	The paper mentions evaluating using the full server model and monitoring 'test accuracy on the server' for early stopping, implying a test/validation set. However, it does not explicitly state the specific training/validation/test dataset splits (e.g., percentages or sample counts) from the original datasets used to reproduce the data partitioning.
Hardware Specification	Yes	We use batch size 32 and perform 363 experiments in total, with an average run-time of 6 h on an NVIDIA Tesla V100.
Software Dependencies	No	The paper mentions using PyTorch for the experiments, but it does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	We train with the optimizer stochastic gradient descent (SGD) with momentum of 0.9, an initial learning rate of η = 0.1, and apply cosine annealing to η = 0.01 and a weight decay of 1.0 10 5. We use batch size 32 and perform 363 experiments in total, with an average run-time of 6 h on an NVIDIA Tesla V100.