Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning
Authors: Yae Jee Cho, Andre Manoel, Gauri Joshi, Robert Sim, Dimitrios Dimitriadis
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity. 4 Experiments For all experiments, partial client participation is considered where 10 clients are sampled from the 100 clients for image tasks and the 106 clients for the language task. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Federated Ensemble Transfer: Fed-ET |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Datasets. For image datasets, the training dataset is partitioned data heterogeneously amongst a total of 100 clients using the Dirichlet distribution Dir K(α) [Hsu et al., 2019]. The public dataset is generated by applying a different data transformation to the data samples (non-overlapping with either the training or test dataset) to further differentiate it with the training dataset. For the language task, we use sentiment classification with Sent140 (Twitter) dataset. For the training dataset, users with more than 100 data samples are treated as the FL clients, leading to a total of 106 clients. |
| Dataset Splits | No | The paper mentions training and test datasets, and a public dataset, but does not explicitly provide information about a separate validation dataset or its split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that are needed to replicate the experiment. |
| Experiment Setup | Yes | w(t,τ) k = w(t,0) k ηt f(w(t,r) k , ξ) (1) where ηt is the learning rate and 1 ξ ξ(t,r) k f(w(t,r) k , ξ) is the stochastic gradient over mini-batch ξ(t,r) k of size b randomly sampled from Bk. After the clients k S(t,0) finish their local updates, the models w(t,τ) k , k S(t,0) are sent to the server. Finally, combining the weighted consensus based cross-entropy loss in (5) with the diversity regularization in (9), the server model is updated, in every communication round t, by minimizing the following objective function: F(w(t,0)) = 1 |P| x P l((x, y(t,τ) s (x)), w(t,0)) + λKL(s(t,τ) div (x), s(w(t,0), x)) (10) Effect of the Diversity Parameter λ. In Table 4, we show the performance of Fed-ET with different values of λ, which modulates the diversity regularization term in (10). |