Integrated Cooperation and Competition in Multi-Agent Decision-Making

Authors: Kyle Wray, Akshat Kumar, Shlomo Zilberstein

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments consider two GD-CCPs which both uniquely combine the spirit of the standard cooperative domain Meeting in a Grid (Amato, Bernstein, and Zilberstein 2010) with the spirit of two competitive domains: Battle of the Sexes and Prisoner s Dilemma (Fudenberg and Tirole 1991). We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. In each scenario, we evaluate the average discounted reward (ADR) vs. the allotted slack (δ). The ADR averages over 1000 trials for each scenario (i.e., each point). The standard error is provided as error bars. We implemented the Prisoner Meeting and Battle Meeting domains on two real robot platforms (Figure 3).
Researcher Affiliation Academia Kyle Hollins Wray,1 Akshat Kumar,2 Shlomo Zilberstein1 1 College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA 2 School of Information Systems, Singapore Management University, Singapore
Pseudocode Yes Algorithm 1 presents a scalable FSC solution to CCPs that assumes a given tuple of fixed size FSC nodes Q. ... Algorithm 1 Approximate FSC Solution to GD-CCP
Open Source Code No Towards this goal, we will provide our source code to support further development of models that generalize cooperation and competition under a unified approach.
Open Datasets No Our experiments consider two GD-CCPs which both uniquely combine the spirit of the standard cooperative domain Meeting in a Grid (Amato, Bernstein, and Zilberstein 2010) with the spirit of two competitive domains: Battle of the Sexes and Prisoner s Dilemma (Fudenberg and Tirole 1991). We consider two novel domains called Battle Meeting and Prisoner Meeting. In both, there are two agents I ={1,2} and the state space is S =S1 S2 with Si ={top left, top right, bottom left, bottom right}. It has action space A=A1 A2 with Ai ={none, north, south, east, west} and observation space Ω={no bump, bump}. The state transitions T are defined as found in Figure 1.
Dataset Splits No We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. In each scenario, we evaluate the average discounted reward (ADR) vs. the allotted slack (δ). The ADR averages over 1000 trials for each scenario (i.e., each point).
Hardware Specification No We solve the NLPs and CCLPs in Tables 1 and 2 using the NEOS Server (Czyzyk, Mesnier, and Mor e 1998) running SNOPT (Gill, Murray, and Saunders 2005). We implemented the Prisoner Meeting and Battle Meeting domains on two real robot platforms (Figure 3).
Software Dependencies No We solve the NLPs and CCLPs in Tables 1 and 2 using the NEOS Server (Czyzyk, Mesnier, and Mor e 1998) running SNOPT (Gill, Murray, and Saunders 2005).
Experiment Setup Yes We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. All other normal cases terminate when maxj dj <ϵ=0.01 as in Algorithm 1. We allowed for a maximum of 50 iterations, occasionally causing early termination of the best response dynamics. Lastly, we have a discount factor of γ =0.95.