Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TTA-FedDG: Leveraging Test-Time Adaptation to Address Federated Domain Generalization

Authors: Haoyuan Liang, Xinyu Zhang, Shilei Cao, Guowen Li, Juepeng Zheng

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have demonstrated the effectiveness of Fed SPL in handling domain shift, outperforming existing Fed DG methods across multiple datasets and model architectures. [...] Experiments Setup In this section, we briefly introduce the experimental setup. Unless otherwise specified, all experiments will be conducted under the following conditions. The default settings are as follows: Dataset. We evaluate our proposed method on three widely used domain benchmarks. Comparing Methods. We selected several representative methods from the TTA and FL fields for comparison, with Fed Avg (Mc Mahan et al. 2017) serving as the baseline method. Implementation Details. For local model training across the PACS, Office Home, and Digit-5 datasets, we utilize architectures Res Net18 and Res Net50... Evaluation Metrics. The evaluation metric is the performance of the model when each domain serves as the test domain. Results We achieved state-of-the-art results on three widely used datasets, particularly on PACS. Ablation Studies Effectiveness of MBSL and FMR We performed a macro-level ablation study on the two main modules of our model. As observed in Tab.4, both MBSL and FMR contribute to improving the baseline model and complement each other. Effect of Different Loss Functions In our ablation experiments on different loss functions, we observed that when generalizing to similar domains, methods Ls, Lw, and Lc all contribute positively to varying degrees as shown in Tab.2.
Researcher Affiliation Academia School of Artificial Intelligence , Sun Yat-sen University , China EMAIL , EMAIL
Pseudocode Yes The pseudocode of our method is in Algorithm 1. [...] Algorithm 1: Federated domain generalization base on select Strong Pseudo Label(Fed SPL)
Open Source Code No No explicit statement about open-sourcing the code or a link to a code repository is provided in the paper.
Open Datasets Yes Dataset. We evaluate our proposed method on three widely used domain benchmarks. (i) The PACS (Li et al. 2017) dataset... (ii) The Office-Home (Venkateswara et al. 2017) dataset... (iii) The Digit-5(Wei and Han 2024) dataset...
Dataset Splits Yes To evaluate the generalization capability of our model, we followed the standard split scheme for training and validation, and we conducted extensive ablation experiments on this dataset. [...] We adopt a leave-one-domain-out evaluation method for all benchmarks, where one domain is reserved for testing and the remaining domains are used for training and validation purposes.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The default settings are as follows: ... standardize the batch size and learning rate at 128 and 0.2, respectively, during local training. Furthermore, to guarantee that the local models reach convergence within each training phase, we set the number of local epochs E to 1 and define the total number of communication rounds R as 200. The hyperparameters for our teacher-student model have already been provided in the Method section.