FedPara: Low-rank Hadamard Product for Communication-Efficient Federated Learning
Authors: Nam Hyeon-Woo, Moon Ye-Bin, Tae-Hyun Oh
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of Fed Para with various network architectures, including VGG, Res Net, and LSTM, on standard classification benchmark datasets for both IID and non-IID settings. We evaluate our Fed Para in terms of communication costs, the number of parameters, and compatibility with other FL methods. |
| Researcher Affiliation | Academia | Nam Hyeon-Woo1, Moon Ye-Bin1, Tae-Hyun Oh1,2,3 1Department of Electrical Engineering, POSTECH 2Graduate School of AI, POSTECH 3Yonsei University |
| Pseudocode | Yes | Algorithm 1: Fed Para |
| Open Source Code | Yes | Project page: https://github.com/South-hw/Fed Para_ICLR22 |
| Open Datasets | Yes | In Fed Para experiments, we use four popular FL datasets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CINIC-10 (Darlow et al., 2018), and the subset of Shakespeare (Shakespeare, 1994). In p Fed Para experiments, we use the subset of handwritten datasets: MNIST (Le Cun et al., 1998) and FEMNIST (Caldas et al., 2018). |
| Dataset Splits | Yes | We split the datasets randomly into 100 partitions for the CIFAR-10 and CINIC-10 IID settings and 50 partitions for the CIFAR-100 IID setting. For the non-IID settings, we use the Dirichlet distribution for random partitioning and set the Dirichlet parameter α as 0.5 as suggested by He et al. (2020b). We assign one partition to each client and sample 16% of clients at each round during FL. |
| Hardware Specification | Yes | For implementation, we use Py Torch Distributed library (Paszke et al., 2019) and 8 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | For implementation, we use Py Torch Distributed library (Paszke et al., 2019)... The paper mentions PyTorch but does not provide specific version numbers for it or any other key software dependencies. |
| Experiment Setup | Yes | We use Fed Avg as a backbone optimization algorithm, and its hyper-parameters of our experiments, such as the initial learning rate η, local batch size B, and learning rate decay τ, are described in Table 6. |