reproducibilityindex.ai

Fair yet Asymptotically Equal Collaborative Learning

Authors: Xiaoqiang Lin, Xinyi Xu, See-Kiong Ng, Chuan-Sheng Foo, Bryan Kian Hsiang Low

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.Our implementation is publicly available at https:// github.com/xqlin98/Fair-yet-Equal-CML.
Researcher Affiliation	Collaboration	1Department of Computer Science, National University of Singapore, Singapore. 2Institute for Infocomm Research, A*STAR, Singapore. 3Institute of Data Science, National University of Singapore, Singapore.
Pseudocode	Yes	Algorithm 1 outlines our framework where lines 1 3 correspond to contribution evaluation (explore) in Sec. 3.1 and line 4 corresponds to incentive realization (exploit) in Sec. 3.2. Our proposed algorithm first performs contribution evaluation until ψt converge, after which uses ψt to design a sampling distribution as in Eq. (2) and follows it in the remaining training for realizing the incentives.
Open Source Code	Yes	Our implementation is publicly available at https:// github.com/xqlin98/Fair-yet-Equal-CML.
Open Datasets	Yes	We perform experiments on the following datasets: (a) image classification tasks: MNIST (Le Cun et al., 1990) and CIFAR-10 (Krizhevsky, 2009), (b) medical image classification task: Pathmnist (PATH) (Yang et al., 2021) containing images of tissues related to colorectal cancer, (c) high frequency trading dataset (HFT) (Ntakaris et al., 2018) as a time-series task to predict the increase/decrease of the bid price of the financial instrument, and (d) electricity load prediction task (ELECTRICITY) (Muehlenpfordt, 2020) as a time-series task to predict the electricity load in German every 15 minutes.
Dataset Splits	No	No explicit details on train/validation/test dataset splits (percentages, sample counts, or specific predefined split references) are provided, only mentions of using "validation loss" and "test accuracy" in figures and discussions.
Hardware Specification	Yes	All the experiments have been run on a server with Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz processor, 256GB RAM and 4 NVIDIA Ge Force RTX 3080 s.
Software Dependencies	No	No specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA), are provided. The text only mentions "stochastic gradient descent" as the optimization algorithm.
Experiment Setup	Yes	In t, each of the N = 30 nodes trains on their own latest data si,t for E = 1 epoch. For contribution evaluation, we use the stopping criterion by setting α = 0.7, τ = 15 on 10 sub-sampled nodes. The total number of iterations is the same for all baselines (including ours): 150 for CIFAR-10, HFT, ELECTRICITY and PATH and 130 for MNIST. For incentive realization, k = 12 (i.e., 40% ratio) and β = 1/150. For Fed Avg, q FFL and FGFL, the selection ratio is 40%. For MNIST, we use the same CNN model as in Sec. 2. The optimization algorithm is stochastic gradient descent with a learning rate of 0.002 on si,t as a batch (for MNIST, \|si,t\| = 3).