Competitive-Cooperative Multi-Agent Reinforcement Learning for Auction-based Federated Learning

Authors: Xiaoli Tang, Han Yu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on six commonly adopted benchmark datasets show that MARL-AFL is significantly more advantageous compared to six stateof-the-art approaches, outperforming the best by 12.2%, 1.9% and 3.4% in terms of social welfare, revenue and accuracy, respectively.
Researcher Affiliation Academia Xiaoli Tang and Han Yu School of Computer Science and Engineering, Nanyang Technological University, Singapore {xiaoli001, han.yu}@ntu.edu.sg
Pseudocode Yes Algorithm 1 Learning Θi in Eq. (3) Algorithm 2 MARL-AFL
Open Source Code No The paper does not provide any explicit statements about making the source code available or links to a code repository for the described methodology.
Open Datasets Yes To evaluate the performance of MARL-AFL, we conduct experiments based on six commonly used datasets in FL studies, including MNIST (http://yann.lecun.com/exdb/mnist/), CIFAR-10 (https://www.cs.toronto.edu/kriz/cifar.html), Fashion-MNIST (i.e., FMNIST) [Xiao et al., 2017], EMNIST-digits (i.e., EMNIST-D), EMNIST-letters (i.e., EMNIST-L) [Cohen et al., 2017] and Kuzushiji-MNIST (i.e., KMNIST) [Clanuwat et al., 2018].
Dataset Splits Yes Both the test set and the validation set for each data consumer include 2,000 samples.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. It mentions training FL models and using a VGG11 network, but no hardware specifics.
Software Dependencies No The paper describes the neural network architectures, optimization algorithms (RMSprop), learning rate, discount factor, and other hyperparameters. However, it does not specify software versions for programming languages, libraries (e.g., PyTorch, TensorFlow), or other key software components.
Experiment Setup Yes The proposed method utilizes fully connected neural networks with three hidden layers each containing 64 nodes to generate bid prices for data owners on behalf of their respective data consumers. The action-value functions Qi and Qi are trained using a replay buffer D with a size of 5,000. During training, the agents explore the environment using an ϵgreedy policy with an annealing rate from 1.0 to 0.05. To update Qi, 32 episodes uniformly sampled from D are used for each training step, and Qi is updated twice after each episode to speed up convergence. The target networks of Qi and Qi are updated once every 20 training episodes. We use RMSprop with a learning rate of 0.0005 to train all neural networks, and set the discount factor γ to 0.99 and the temperature hyperparameter τ to 4.