Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization

Authors: Jaehong Yoon, Geon Park, Wonyong Jeong, Sung Ju Hwang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate Pro WD against relevant FL baselines on the benchmark datasets, using clients with varying bitwidths. Our Pro WD largely outperforms the baseline FL algorithms as well as naive approaches (e.g. grouped averaging) under the proposed BHFL scenario.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2AITRICS, South Korea.
Pseudocode Yes Algorithm 1 Training of progressive weight dequantizer
Open Source Code No The paper does not contain an explicit statement or a direct link to open-source code for the methodology described.
Open Datasets Yes We use the widely used benchmark dataset for federated learning methods, CIFAR-10 to validate our method following the IID experimental settings of the existing works (Reisizadeh et al., 2020; Haddadpour et al., 2021). CIFAR-10 is a image classification dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances.
Dataset Splits No CIFAR-10 is a image classification dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances. For FL purposes, we uniformly split the training instances per class by the number of clients participating in the federated learning system. (Explanation: The paper specifies training and test instances but does not explicitly mention a separate validation split for the main federated learning process, which is often crucial for hyperparameter tuning.)
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions general software components like SGD optimizer but does not provide specific version numbers for any key software dependencies or libraries.
Experiment Setup Yes At each round, we train each client for 200 local steps. The Float32 network is trained with SGD with learning rate 0.1, and momentum value 0.9. Additionally, the gradient ℓ2 norm is clipped to 2.0. ... We train a weight dequantizer φ with a SGD optimizer with the learning rate of 0.01, batch size of 16, for 5 epochs for all experiments.