Integrated Cooperation and Competition in Multi-Agent Decision-Making
Authors: Kyle Wray, Akshat Kumar, Shlomo Zilberstein
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments consider two GD-CCPs which both uniquely combine the spirit of the standard cooperative domain Meeting in a Grid (Amato, Bernstein, and Zilberstein 2010) with the spirit of two competitive domains: Battle of the Sexes and Prisoner s Dilemma (Fudenberg and Tirole 1991). We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. In each scenario, we evaluate the average discounted reward (ADR) vs. the allotted slack (δ). The ADR averages over 1000 trials for each scenario (i.e., each point). The standard error is provided as error bars. We implemented the Prisoner Meeting and Battle Meeting domains on two real robot platforms (Figure 3). |
| Researcher Affiliation | Academia | Kyle Hollins Wray,1 Akshat Kumar,2 Shlomo Zilberstein1 1 College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA 2 School of Information Systems, Singapore Management University, Singapore |
| Pseudocode | Yes | Algorithm 1 presents a scalable FSC solution to CCPs that assumes a given tuple of fixed size FSC nodes Q. ... Algorithm 1 Approximate FSC Solution to GD-CCP |
| Open Source Code | No | Towards this goal, we will provide our source code to support further development of models that generalize cooperation and competition under a unified approach. |
| Open Datasets | No | Our experiments consider two GD-CCPs which both uniquely combine the spirit of the standard cooperative domain Meeting in a Grid (Amato, Bernstein, and Zilberstein 2010) with the spirit of two competitive domains: Battle of the Sexes and Prisoner s Dilemma (Fudenberg and Tirole 1991). We consider two novel domains called Battle Meeting and Prisoner Meeting. In both, there are two agents I ={1,2} and the state space is S =S1 S2 with Si ={top left, top right, bottom left, bottom right}. It has action space A=A1 A2 with Ai ={none, north, south, east, west} and observation space Ω={no bump, bump}. The state transitions T are defined as found in Figure 1. |
| Dataset Splits | No | We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. In each scenario, we evaluate the average discounted reward (ADR) vs. the allotted slack (δ). The ADR averages over 1000 trials for each scenario (i.e., each point). |
| Hardware Specification | No | We solve the NLPs and CCLPs in Tables 1 and 2 using the NEOS Server (Czyzyk, Mesnier, and Mor e 1998) running SNOPT (Gill, Murray, and Saunders 2005). We implemented the Prisoner Meeting and Battle Meeting domains on two real robot platforms (Figure 3). |
| Software Dependencies | No | We solve the NLPs and CCLPs in Tables 1 and 2 using the NEOS Server (Czyzyk, Mesnier, and Mor e 1998) running SNOPT (Gill, Murray, and Saunders 2005). |
| Experiment Setup | Yes | We evaluate our approximate CCP algorithm in simulation with three different amounts of controller nodes with |Qi| {2,4,6} for each agent i in Figure 2. All other normal cases terminate when maxj dj <ϵ=0.01 as in Algorithm 1. We allowed for a maximum of 50 iterations, occasionally causing early termination of the best response dynamics. Lastly, we have a discount factor of γ =0.95. |