Cooperative Multi-Agent Fairness and Equivariant Policies

Authors: Niko A. Grupen, Bart Selman, Daniel D. Lee9350-9359

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We refer to our multi-agent learning strategy as Fairness through Equivariance (Fair-E) and demonstrate its effectiveness empirically. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that it reaches higher levels of utility than Fair-E and fairer outcomes than non-equivariant policies.
Researcher Affiliation Academia Niko A. Grupen1, Bart Selman1, Daniel D. Lee2 1 Cornell University, Ithaca, NY 2 Cornell Tech, New York, NY {nag83, bs54m, ddl46}@cornell.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes the pursuit-evasion game setup but does not provide concrete access information (specific link, DOI, repository name, or formal citation with authors/year) for a publicly available or open dataset corresponding to their specific experimental environment instance.
Dataset Splits No The paper does not provide specific dataset split information for validation (e.g., percentages, sample counts, or citations to predefined validation splits).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using DDPG but does not specify ancillary software details such as library or solver names with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In each experiment, n=3 pursuer agents are trained in a decentralized manner (each following DDPG) for a total of 125,000 episodes, during which velocity is decreased from | vp| = 1.2 to | vp| = 0.4. The evader speed is fixed at | ve| = 1.0. ... J(φi) + λJeqv(φ1, ..., φi, ..., φn) (10) where λ is a fairness control parameter weighting the strength of equivariance.