Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Decision-Theoretic Planning Under Anonymity in Agent Populations
Authors: Ekhlas Sonu, Yingke Chen, Prashant Doshi
JAIR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The third contribution of this article is a comprehensive empirical evaluation of the methods on three new problems domains policing large protests, controlling traf๏ฌc congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. |
| Researcher Affiliation | Academia | Ekhlas Sonu EMAIL Dept of Aeronautics and Astronautics Stanford University Stanford, CA 94305 USA; Yingke Chen EMAIL College of Computer Science Sichuan University Sichuan, China; Prashant Doshi EMAIL THINC Lab, Dept of Computer Science University of Georgia Athens, GA 30602 USA |
| Pseudocode | Yes | Algorithm 1 Computing Pr(Cฮฝ( )|b0,l(M1,l 1|s), . . . , b0,l(MN,l 1|s)) Algorithm 2 Initialize Node Algorithm 3 Update Bounds Algorithm 4 Branch Algorithm 5 Bound |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. It mentions third-party software like 'Clash of Clans' but not its own implementation code. |
| Open Datasets | No | The paper describes three problem domains (policing protests, traffic congestion control, multiplayer video gaming/Clash of Clans) used for empirical evaluation. These are described as simulated environments or a game, rather than external, publicly available datasets with specific access information (links, DOIs, formal citations). |
| Dataset Splits | No | The paper uses simulated problem domains for its experiments, rather than specific datasets. Therefore, there are no dataset splits (e.g., train/test/validation) described. |
| Hardware Specification | Yes | All computations are carried out on a RHEL platform with 2.80 GHz processor and 4 GB of main memory. |
| Software Dependencies | No | The paper mentions a 'RHEL platform' but does not specify versions for any programming languages, libraries, frameworks, or specialized solvers used for implementation. |
| Experiment Setup | Yes | We set the maximum planning horizon to 5 in all the experiments. The transition, observation and reward functions of the many-agent I-POMDP are all compactly encoded as frame-action hypergraphs; example hypergraphs are shown in Fig. 6. Other agents are modeled as POMDPs and their predicted behavior is obtained using bounded policy iteration (Poupart & Boutilier, 2003). In our second set of experiments, we evaluate on settings involving many more agents. As Table 1 indicates, the traditional I-POMDP does not realistically scale to N > 5 agents. |