Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decision-Theoretic Planning Under Anonymity in Agent Populations

Authors: Ekhlas Sonu, Yingke Chen, Prashant Doshi

JAIR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The third contribution of this article is a comprehensive empirical evaluation of the methods on three new problems domains policing large protests, controlling traf๏ฌc congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective.
Researcher Affiliation Academia Ekhlas Sonu EMAIL Dept of Aeronautics and Astronautics Stanford University Stanford, CA 94305 USA; Yingke Chen EMAIL College of Computer Science Sichuan University Sichuan, China; Prashant Doshi EMAIL THINC Lab, Dept of Computer Science University of Georgia Athens, GA 30602 USA
Pseudocode Yes Algorithm 1 Computing Pr(Cฮฝ( )|b0,l(M1,l 1|s), . . . , b0,l(MN,l 1|s)) Algorithm 2 Initialize Node Algorithm 3 Update Bounds Algorithm 4 Branch Algorithm 5 Bound
Open Source Code No The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions third-party software like 'Clash of Clans' but not its own implementation code.
Open Datasets No The paper describes three problem domains (policing protests, traffic congestion control, multiplayer video gaming/Clash of Clans) used for empirical evaluation. These are described as simulated environments or a game, rather than external, publicly available datasets with specific access information (links, DOIs, formal citations).
Dataset Splits No The paper uses simulated problem domains for its experiments, rather than specific datasets. Therefore, there are no dataset splits (e.g., train/test/validation) described.
Hardware Specification Yes All computations are carried out on a RHEL platform with 2.80 GHz processor and 4 GB of main memory.
Software Dependencies No The paper mentions a 'RHEL platform' but does not specify versions for any programming languages, libraries, frameworks, or specialized solvers used for implementation.
Experiment Setup Yes We set the maximum planning horizon to 5 in all the experiments. The transition, observation and reward functions of the many-agent I-POMDP are all compactly encoded as frame-action hypergraphs; example hypergraphs are shown in Fig. 6. Other agents are modeled as POMDPs and their predicted behavior is obtained using bounded policy iteration (Poupart & Boutilier, 2003). In our second set of experiments, we evaluate on settings involving many more agents. As Table 1 indicates, the traditional I-POMDP does not realistically scale to N > 5 agents.