Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Personalized Federated Learning with Spurious Features: An Adversarial Approach
Authors: Xiaoyang Wang, Han Zhao, Klara Nahrstedt, Sanmi Koyejo
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on object and action recognition tasks show that our proposed approach bounds personalized models from further exploiting spurious features while preserving the benefit of enhanced accuracy from fine-tuning. We conduct extensive experiments to validate the effectiveness of the proposed methods under FL settings. Our experiments on MNIST (Deng, 2012), Coil20 (Nene et al., 1996), Celeb A (Liu et al., 2015; Caldas et al., 2018), and biased action recognition (BAR) (Nam et al., 2020) datasets show that the proposed approach reduces the accuracy disparity of personalized models from 18.38% to 3.42%. Our method also preserves the benefit of the enhanced average accuracy from fine-tuning, resulting in 4.48% accuracy improvement in the global environment. |
| Researcher Affiliation | Academia | Xiaoyang Wang EMAIL Department of Computer Science University of Illinois at Urbana-Champaign Han Zhao EMAIL Department of Computer Science University of Illinois at Urbana-Champaign Klara Nahrstedt EMAIL Department of Computer Science University of Illinois at Urbana-Champaign Sanmi Koyejo EMAIL Department of Computer Science Stanford University |
| Pseudocode | No | The paper describes methods and processes in narrative text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks with structured, step-by-step instructions. |
| Open Source Code | No | The paper states it was "Reviewed on Open Review: https: // openreview. net/ forum? id= N2wx9UVHk H", but this link is to a review forum and not a direct source code repository. There is no explicit statement in the paper about the release of source code for the described methodology, nor is there a direct link to a code repository. |
| Open Datasets | Yes | We conduct extensive experiments to validate the effectiveness of the proposed methods under FL settings. Our experiments on MNIST (Deng, 2012), Coil20 (Nene et al., 1996), Celeb A (Liu et al., 2015; Caldas et al., 2018), and biased action recognition (BAR) (Nam et al., 2020) datasets show that the proposed approach reduces the accuracy disparity of personalized models from 18.38% to 3.42%. |
| Dataset Splits | Yes | Local datasets are further partitioned to train/validation/test set with a ratio of 72:8:20, following prior work (Li et al., 2021). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU models, CPU types, or other accelerator specifications). |
| Software Dependencies | No | We use Adam optimizer (Kingma & Ba, 2015) throughout our experiments with a learning rate of 1e-4 for MNIST, Celeb A, and BAR and 2e-4 for Coil20. While an optimizer is mentioned, specific software dependencies like programming language versions or machine learning framework versions (e.g., PyTorch, TensorFlow) are not provided with version numbers. |
| Experiment Setup | Yes | We use Adam optimizer (Kingma & Ba, 2015) throughout our experiments with a learning rate of 1e-4 for MNIST, Celeb A, and BAR and 2e-4 for Coil20. ... We train the global model for 500 rounds. 5 clients are selected per round, each performing 5 epochs of local updates. We tune the coefficients of the adversarial transferability and L2 regularization terms from {0.01, 0.1, 1.0, 10.0} and select the largest value that does not decrease the validation accuracy during penalization. We start the attack budget at 0.031 (i.e., 8/255) and gradually decrease it such that 30%–50% of the attack succeeds. We configure ϵ to 0.031/0.01/0.031 for MNIST/Celeb A/Coil20, respectively. We fine-tune the global model for 5 epochs on MNIST/BAR and 10 epochs on Coil20/Celeb A, which are sufficient for the personalized models to converge. |