reproducibilityindex.ai

Planning for a Single Agent in a Multi-Agent Environment Using FOND

Authors: Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on existing and new multiagent benchmarks, demonstrating that modelling the other agents goals improves the quality of the resulting solutions. 4 Evaluation We modiﬁed the FOND planner PRP as described above to create MA-PRP, and enabled it to parse custom PDDL that speciﬁes the agents and their goals. To simplify the expression of joint actions, the domains enforce a round-robin execution of the agents. This setup is similar to the round robin games speciﬁed in the Game Description Logic [Love et al., 2008], and allows us to adhere to Equation (1) e ectively. As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for ﬁve domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems. We ran several planner conﬁgurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials.
Researcher Affiliation	Academia	Christian Muise, Paolo Felli, Tim Miller, Adrian R. Pearce, Liz Sonenberg Department of Computing and Information Systems, University of Melbourne {christian.muise, paolo.felli, tmiller, adrianrp, l.sonenberg}@unimelb.edu.au
Pseudocode	Yes	Algorithm 1: Generate FP-MAP Strong Cyclic Plan
Open Source Code	No	The paper mentions modifying the PRP planner to create MA-PRP but does not provide any concrete access (link, explicit statement of release) to the source code.
Open Datasets	No	As our approach opens planning to a new class of problems, there are no publicly available FP-MAP benchmarks to evaluate on as far as we are aware. Instead, we provide a suite of new benchmark problems for ﬁve domains: Blocksworld, Sokoban, Tic-Tac-Toe, Breakthrough, and Connect4. We use these to evaluate our proposed strategies for mitigating nondeterminism, and the general ability of the planner to solve fully-observable FP-MAP problems.
Dataset Splits	No	The paper describes running '100 simulated trials' on generated policies and using '500 monte-carlo roll-outs' for opponent moves, but it does not specify traditional training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	No	The paper states 'Every run was limited to 2Gb memory and 30min time limit' but does not specify any hardware details like GPU/CPU models or specific machine configurations used for the experiments.
Software Dependencies	No	The paper mentions modifying 'PRP [Muise et al., 2012]' and using 'custom PDDL', but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We ran several planner conﬁgurations to generate the policies for a single planning agent, and we tested the generated policies using 100 simulated trials. Moves for other agents were selected by taking the best applicable action measured using 500 monte-carlo roll-outs per action in the current state, with the stopping condition set to the agent s goal. Every run was limited to 2Gb memory and 30min time limit (unless otherwise speciﬁed). If the planner did not ﬁnish in the time provided, the best incumbent policy computed thus-far was used. The following planner conﬁgurations were considered: (30min) MA-PRP with 5 epochs and smart priority list (cf. Sec 3.2); ([3min\|30sec]) reduced time limit; (1-Epoch) same as 30min with only one epoch; ([Plausible\|Distance\|Stack] OL) same as 30min with only action plausibility (respectively distance from initial state and original LIFO) used for the open list.