Fair Algorithms for Multi-Agent Multi-Armed Bandits
Authors: Safwan Hossain, Evi Micha, Nisarg Shah
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are N agents and K arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the best arm ; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare. ... Our work is purely theoretical in nature, and we do not anticipate any immediate societal risks. |
| Researcher Affiliation | Academia | Safwan Hossain Department of Computer Science University of Toronto safwan.hossain@mail.utoronto.ca Evi Micha Department of Computer Science University of Toronto emicha@cs.toronto.edu Nisarg Shah Department of Computer Science University of Toronto nisarg@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1: UCB Input: Number of agents N, number of arms K Parameters :Confidence parameter αt for each t N // Pull each arm once for t = 1, . . . , K do pt policy that puts probability 1 on arm t // Pull arm t deterministically end for t = K + 1, . . . do Compute the estimated reward matrix bµt pt arg maxp K NSW(p, bµt) + αt P j [K] pj rt j, where rt j q |
| Open Source Code | No | The paper states 'Our work is purely theoretical in nature' and does not mention releasing open-source code for the described algorithms or providing any links to code repositories. |
| Open Datasets | No | The paper is theoretical and does not involve empirical studies using datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical studies, therefore no training/validation/test splits are discussed. |
| Hardware Specification | No | The paper is purely theoretical and does not report on experimental setups or hardware used. |
| Software Dependencies | No | The paper is purely theoretical and does not describe software dependencies with version numbers, as it does not report on experimental implementations. |
| Experiment Setup | No | The paper is purely theoretical and does not detail experimental setups or hyperparameters. |