reproducibilityindex.ai

Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization

Authors: Jaehong Yoon, Geon Park, Wonyong Jeong, Sung Ju Hwang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Pro WD against relevant FL baselines on the benchmark datasets, using clients with varying bitwidths. Our Pro WD largely outperforms the baseline FL algorithms as well as naive approaches (e.g. grouped averaging) under the proposed BHFL scenario.
Researcher Affiliation	Collaboration	1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2AITRICS, South Korea.
Pseudocode	Yes	Algorithm 1 Training of progressive weight dequantizer
Open Source Code	No	The paper does not contain an explicit statement or a direct link to open-source code for the methodology described.
Open Datasets	Yes	We use the widely used benchmark dataset for federated learning methods, CIFAR-10 to validate our method following the IID experimental settings of the existing works (Reisizadeh et al., 2020; Haddadpour et al., 2021). CIFAR-10 is a image classiﬁcation dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances.
Dataset Splits	No	CIFAR-10 is a image classiﬁcation dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances. For FL purposes, we uniformly split the training instances per class by the number of clients participating in the federated learning system. (Explanation: The paper specifies training and test instances but does not explicitly mention a separate validation split for the main federated learning process, which is often crucial for hyperparameter tuning.)
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions general software components like SGD optimizer but does not provide specific version numbers for any key software dependencies or libraries.
Experiment Setup	Yes	At each round, we train each client for 200 local steps. The Float32 network is trained with SGD with learning rate 0.1, and momentum value 0.9. Additionally, the gradient ℓ2 norm is clipped to 2.0. ... We train a weight dequantizer φ with a SGD optimizer with the learning rate of 0.01, batch size of 16, for 5 epochs for all experiments.