Bitwidth Heterogeneous Federated Learning with Progressive Weight Dequantization
Authors: Jaehong Yoon, Geon Park, Wonyong Jeong, Sung Ju Hwang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Pro WD against relevant FL baselines on the benchmark datasets, using clients with varying bitwidths. Our Pro WD largely outperforms the baseline FL algorithms as well as naive approaches (e.g. grouped averaging) under the proposed BHFL scenario. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2AITRICS, South Korea. |
| Pseudocode | Yes | Algorithm 1 Training of progressive weight dequantizer |
| Open Source Code | No | The paper does not contain an explicit statement or a direct link to open-source code for the methodology described. |
| Open Datasets | Yes | We use the widely used benchmark dataset for federated learning methods, CIFAR-10 to validate our method following the IID experimental settings of the existing works (Reisizadeh et al., 2020; Haddadpour et al., 2021). CIFAR-10 is a image classification dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances. |
| Dataset Splits | No | CIFAR-10 is a image classification dataset that consists of 10 object classes each of which has 5,000 training instances and 1,000 test instances. For FL purposes, we uniformly split the training instances per class by the number of clients participating in the federated learning system. (Explanation: The paper specifies training and test instances but does not explicitly mention a separate validation split for the main federated learning process, which is often crucial for hyperparameter tuning.) |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions general software components like SGD optimizer but does not provide specific version numbers for any key software dependencies or libraries. |
| Experiment Setup | Yes | At each round, we train each client for 200 local steps. The Float32 network is trained with SGD with learning rate 0.1, and momentum value 0.9. Additionally, the gradient ℓ2 norm is clipped to 2.0. ... We train a weight dequantizer φ with a SGD optimizer with the learning rate of 0.01, batch size of 16, for 5 epochs for all experiments. |