Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Decision-Theoretic Planning Under Anonymity in Agent Populations
Authors: Ekhlas Sonu, Yingke Chen, Prashant Doshi
JAIR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The third contribution of this article is a comprehensive empirical evaluation of the methods on three new problems domains policing large protests, controlling traf๏ฌc congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. |
| Researcher Affiliation | Academia | Ekhlas Sonu EMAIL Dept of Aeronautics and Astronautics Stanford University Stanford, CA 94305 USA; Yingke Chen EMAIL College of Computer Science Sichuan University Sichuan, China; Prashant Doshi EMAIL THINC Lab, Dept of Computer Science University of Georgia Athens, GA 30602 USA |
| Pseudocode | Yes | Algorithm 1 Computing Pr(Cฮฝ( )|b0,l(M1,l 1|s), . . . , b0,l(MN,l 1|s)) Algorithm 2 Initialize Node Algorithm 3 Update Bounds Algorithm 4 Branch Algorithm 5 Bound |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions third-party software like 'Clash of Clans' but not its own implementation code. |
| Open Datasets | No | The paper describes three problem domains (policing protests, traffic congestion control, multiplayer video gaming/Clash of Clans) used for empirical evaluation. These are described as simulated environments or a game, rather than external, publicly available datasets with specific access information (links, DOIs, formal citations). |
| Dataset Splits | No | The paper uses simulated problem domains for its experiments, rather than specific datasets. Therefore, there are no dataset splits (e.g., train/test/validation) described. |
| Hardware Specification | Yes | All computations are carried out on a RHEL platform with 2.80 GHz processor and 4 GB of main memory. |
| Software Dependencies | No | The paper mentions a 'RHEL platform' but does not specify versions for any programming languages, libraries, frameworks, or specialized solvers used for implementation. |
| Experiment Setup | Yes | We set the maximum planning horizon to 5 in all the experiments. The transition, observation and reward functions of the many-agent I-POMDP are all compactly encoded as frame-action hypergraphs; example hypergraphs are shown in Fig. 6. Other agents are modeled as POMDPs and their predicted behavior is obtained using bounded policy iteration (Poupart & Boutilier, 2003). In our second set of experiments, we evaluate on settings involving many more agents. As Table 1 indicates, the traditional I-POMDP does not realistically scale to N > 5 agents. |