Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Federated Learning under Covariate Shifts with Generalization Guarantees

Authors: Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios Chrysos, Volkan Cevher

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the superiority of FTW-ERM over existing FL baselines in challenging imbalanced federated settings in terms of data distribution shifts across clients. Experimentally demonstrate more than 16% overall test accuracy improvement over existing FL baselines when training Res Net-18 (He et al., 2016) on CIFAR10 (Krizhevsky) in challenging imbalanced federated settings in terms of data distribution shifts across clients. In conclusion, we expand the concept and application scope of FL to a general setting under intra/inter-client covariate shifts, provide an in-depth theoretical understanding of learning with FTW-ERM via a general DRM, and experimentally validate the utility of the proposed framework.
Researcher Affiliation Academia Ali Ramezani-Kebrya EMAIL Department of Informatics, University of Oslo and Visual Intelligence Centre. Fanghui Liu EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL. Thomas Pethick EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL. Grigorios Chrysos EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL. Volkan Cevher EMAIL Laboratory for Information and Inference Systems (LIONS), EPFL.
Pseudocode Yes Algorithm 1: Histogram-based density ratio matching.
Open Source Code No The paper does not provide an explicit statement about releasing code, a direct link to a code repository, or mention of code in supplementary materials.
Open Datasets Yes For MNIST-based experiments we use a Le Net (Le Cun et al., 1989) with cross entropy loss and compute standard deviations over 5 independent executions. For CIFAR10-based experiments we use the larger Res Net-18 (He et al., 2016). We make use of three datasets in the experiments: MNIST (Le Cun et al., 1998), Fashion MNIST 8 (Xiao et al., 2017), and CIFAR10 (Krizhevsky).
Dataset Splits Yes We split the 10-class Fashion MNIST dataset between 5 clients and simulate a target shift by including different fractions of examples from each class across the training data and test data. Table 1: Fashion MNIST with label shift across five clients, where each client receives different fractions of examples from each class. Table 6: CIFAR10 target shift distribution across 100 clients where groups of 10 clients shares the same distribution. Table 9: Fashion MNIST target shift distribution.
Hardware Specification No All experiments are carried out on an internal cluster using one GPU. This statement is too general and does not provide specific hardware models (e.g., GPU model, CPU type, memory details).
Software Dependencies No The stochastic gradient for each of the clients are computed with a batch size of 64 and aggregated on the server, which uses the Adam optimizer. Experiments on MNIST and Fashion MNIST uses a Le Net (Le Cun et al., 1998), a learning rate of 0.001, no weight decay, and runs for 5, 000 iterations. For CIFAR10 experiments we use the larger Res Net-18 (He et al., 2016). While specific models and optimizers are mentioned, no version numbers for programming languages, libraries (e.g., PyTorch, TensorFlow), or other software are provided.
Experiment Setup Yes The stochastic gradient for each of the clients are computed with a batch size of 64 and aggregated on the server, which uses the Adam optimizer. Experiments on MNIST and Fashion MNIST uses a Le Net (Le Cun et al., 1998), a learning rate of 0.001, no weight decay, and runs for 5, 000 iterations. For CIFAR10 experiments we use the larger Res Net-18 (He et al., 2016). Batch normalization in Res Net-18 is treated by averaging the statistics on the server and subsequently broadcasting to the workers. A learning rate of 0.0001 and weight decay of 0.0001 are used. We report the best iterate in terms of average test accuracy after 20, 000 iterations in Table 7. The partial client participation experiment in Table 2 uses 200, 000 iterations.