Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning

Authors: Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we provide a deeper theoretical understanding of this phenomenon. To do so, we study the class of two-layer convolutional neural networks (CNNs) and provide bounds on the training error convergence and test error of such a network trained with Fed Avg. We introduce the notion of aligned and misaligned filters at initialization and show that the data heterogeneity only affects learning on misaligned filters... Experiments in synthetic settings and practical FL training on CNNs verify our theoretical findings.
Researcher Affiliation Collaboration Divyansh Jhunjhunwala EMAIL Carnegie Mellon University; Pranay Sharma EMAIL IIT Bombay; Zheng Xu EMAIL Google; Gauri Joshi EMAIL Carnegie Mellon University
Pseudocode No The paper describes the steps of the Fed Avg algorithm verbally and with mathematical equations, but it does not present them within a clearly labeled 'Pseudocode' or 'Algorithm' block with structured code-like formatting.
Open Source Code No The paper does not provide a direct link to a source code repository, an explicit statement of code release for the methodology described, or mention code in supplementary materials. It only mentions using 'Res Net18 model He et al. (2016)' and 'Py Torch Paszke et al. (2019)' which are third-party tools/models.
Open Datasets Yes We simulate a FL setup with K = 10 clients on the CIFAR10 data partitioned using Dirichlet(α)... For pre-training, we consider a SqueezeNet model pre-trained on Image Net Russakovsky et al. (2015)... To demonstrate this, we consider federated training on the 1. CIFAR-10 Krizhevsky (2009) and 2. Tiny Image Net Le & Yang (2015) datasets.
Dataset Splits Yes We simulate a FL setup with K = 10 clients on the CIFAR10 data partitioned using Dirichlet(α) with α = 0.1 for the non-IID setting and α = 10 for the IID setting. For experiments on neural network training we use one H100 GPU with 2 cores and 20GB memory. For synthetic data experiments we use one T4 GPU... We simulate a FL setup with K = 20 clients using Dirichlet(α) Hsu et al. (2019)... with α = 0.05 heterogeneity (Figure 6a and Figure 6b) and α = 10 heterogeneity (Figure 6c and Figure 6d).
Hardware Specification Yes For experiments on neural network training we use one H100 GPU with 2 cores and 20GB memory. For synthetic data experiments we use one T4 GPU.
Software Dependencies No We use Py Torch Paszke et al. (2019) to run all our algorithms and also simulate our synthetic data setting. Following Nguyen et al. (2022) we replace the Batch Norm layers in the model with Group Norm Wu & He (2018).
Experiment Setup Yes For FL optimization we use the vanilla Fed Avg optimizer with server step size ηg = 1 and train the model for 500 rounds and 1 local epoch at each client. For centralized optimization we use SGD optimizer and run the optimization for 200 epochs. Learning rates were tuned using grid search with the grid {0.1, 0.01, 0.001}... In the case of random initiation, for local optimization we use SGD optimizer with a learning rate of 0.01 and 0.9 momentum. In the case of pre-trained initiation, for local optimization we use SGD optimizer with a learning rate of 0.001 and 0.9 momentum. The learning rate is decayed by a factor of 0.998 in every round in the case for both initializations.