Planning for a Single Agent in a Multi-Agent Environment Using FOND
Authors: Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on existing and new multiagent benchmarks, demonstrating that modelling the other agents goals improves the quality of the resulting solutions. 4 Evaluation We modified the FOND planner PRP as described above to create MA-PRP, and enabled it to parse custom PDDL that specifies the agents and their goals. To simplify the expression of joint actions, the domains enforce a round-robin execution of the agents. This setup is similar to the round robin games specified in the Game Description Logic [Love et al., 2008], and allows us to adhere to Equation (1) e ectively. As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for five domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems. We ran several planner configurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials. |
| Researcher Affiliation | Academia | Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg Department of Computing and Information Systems, University of Melbourne {christian.muise, paolo.felli, tmiller, adrianrp, l.sonenberg}@unimelb.edu.au |
| Pseudocode | Yes | Algorithm 1: Generate FP-MAP Strong Cyclic Plan |
| Open Source Code | No | The paper mentions modifying the PRP planner to create MA-PRP but does not provide any concrete access (link, explicit statement of release) to the source code. |
| Open Datasets | No | As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for five domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems. |
| Dataset Splits | No | The paper describes running '100 simulated trials' on generated policies and using '500 monte-carlo roll-outs' for opponent moves, but it does not specify traditional training/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper states 'Every run was limited to 2Gb memory and 30min time limit' but does not specify any hardware details like GPU/CPU models or specific machine configurations used for the experiments. |
| Software Dependencies | No | The paper mentions modifying 'PRP [Muise et al., 2012]' and using 'custom PDDL', but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We ran several planner configurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials. Moves for other agents were selected by taking the best applicable action measured using 500 monte-carlo roll-outs per action in the current state, with the stopping condition set to the agent s goal. Every run was limited to 2Gb memory and 30min time limit (unless otherwise specified). If the planner did not finish in the time provided, the best incumbent policy computed thus-far was used. The following planner configurations were considered: (30min) MA-PRP with 5 epochs and smart priority list (cf. Sec 3.2); ([3min|30sec]) reduced time limit; (1-Epoch) same as 30min with only one epoch; ([Plausible|Distance|Stack] OL) same as 30min with only action plausibility (respectively distance from initial state and original LIFO) used for the open list. |