Planning for a Single Agent in a Multi-Agent Environment Using FOND

Authors: Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on existing and new multiagent benchmarks, demonstrating that modelling the other agents goals improves the quality of the resulting solutions. 4 Evaluation We modified the FOND planner PRP as described above to create MA-PRP, and enabled it to parse custom PDDL that specifies the agents and their goals. To simplify the expression of joint actions, the domains enforce a round-robin execution of the agents. This setup is similar to the round robin games specified in the Game Description Logic [Love et al., 2008], and allows us to adhere to Equation (1) e ectively. As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for five domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems. We ran several planner configurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials.
Researcher Affiliation Academia Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg Department of Computing and Information Systems, University of Melbourne {christian.muise, paolo.felli, tmiller, adrianrp, l.sonenberg}@unimelb.edu.au
Pseudocode Yes Algorithm 1: Generate FP-MAP Strong Cyclic Plan
Open Source Code No The paper mentions modifying the PRP planner to create MA-PRP but does not provide any concrete access (link, explicit statement of release) to the source code.
Open Datasets No As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for five domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems.
Dataset Splits No The paper describes running '100 simulated trials' on generated policies and using '500 monte-carlo roll-outs' for opponent moves, but it does not specify traditional training/validation/test dataset splits with percentages or sample counts.
Hardware Specification No The paper states 'Every run was limited to 2Gb memory and 30min time limit' but does not specify any hardware details like GPU/CPU models or specific machine configurations used for the experiments.
Software Dependencies No The paper mentions modifying 'PRP [Muise et al., 2012]' and using 'custom PDDL', but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We ran several planner configurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials. Moves for other agents were selected by taking the best applicable action measured using 500 monte-carlo roll-outs per action in the current state, with the stopping condition set to the agent s goal. Every run was limited to 2Gb memory and 30min time limit (unless otherwise specified). If the planner did not finish in the time provided, the best incumbent policy computed thus-far was used. The following planner configurations were considered: (30min) MA-PRP with 5 epochs and smart priority list (cf. Sec 3.2); ([3min|30sec]) reduced time limit; (1-Epoch) same as 30min with only one epoch; ([Plausible|Distance|Stack] OL) same as 30min with only action plausibility (respectively distance from initial state and original LIFO) used for the open list.