StableFDG: Style and Attention Based Learning for Federated Domain Generalization

Authors: Jungwuk Park, Dong-Jun Han, Jinho Kim, Shiqiang Wang, Christopher Brinton, Jaekyun Moon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that Stable FDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.
Researcher Affiliation Collaboration Jungwuk Park KAIST savertm@kaist.ac.kr Dong-Jun Han Purdue University han762@purdue.edu Jinho Kim SK Hynix jinho123.kim@sk.com Shiqiang Wang IBM Research wangshiq@us.ibm.com Christopher G. Brinton Purdue University cgb@purdue.edu Jaekyun Moon KAIST jmoon@kaist.edu
Pseudocode Yes Algorithm 1 summarizes the overall process of our Stable FDG.
Open Source Code No Our code is built upon the official code of [39] and [1]. No explicit statement or link is provided for the code developed for this specific paper.
Open Datasets Yes We consider five datasets commonly adopted in DG literature: PACS [14], VLCS [6], Digits-DG [38], Office-Home [33], and Domain Net [30].
Dataset Splits Yes We follow the conventional leave-one-domain-out protocol where one domain is selected as a target and the remaining domains are utilized as sources. ... We consider a setup with N = 30 clients and distribute the training set into two different ways: single-domain data distribution and multi-domain data distribution scenarios. In a single-domain setup, we let each client to have training data that belong to a single source domain. ... In a multi-domain distribution setup, each client can have multiple domains, but the domain distribution within each client is heterogeneous. For each domain, we sample the heterogeneous proportion from Dirichlet distribution with dimension N = 30 and parameter of 0.5, and distribute the train samples of each domain to individual clients according to the sampled proportion.
Hardware Specification Yes We also compare the computation time by measuring the time required for local update at each client using an GTX 1080 Ti GPU.
Software Dependencies No Our code is built upon the official code of [39] and [1]. No specific version numbers for software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages are provided.
Experiment Setup Yes The exploration level α is set to 3 for all experiments regardless of datasets. For our attention module, we set the embedding size d of queries Q and keys K to 30... FL is performed for 50 global rounds and we trained the local model for 5 epochs with a mini-batch size of 32. ... we use SGD as an optimizer with a momentum of 0.9 and a weight decay of 5e 4. For PACS, Office-Home and VLCS, the learning rate is set to 0.001 and the cosine annealing is used as a scheduler. For Digits-DG, we set the learning rate to 0.02 and the learning rate is decayed by 0.1 every 20 steps.