Bayesian Learning of Other Agents’ Finite Controllers for Interactive POMDPs
Authors: Alessandro Panella, Piotr Gmytrasiewicz
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Experimental Results We evaluate our approach on three problems of varying complexity. The first is the multiagent Tiger Problem introduced in (Gmytrasiewicz and Doshi 2005). The optimal (true) controller used by agent j has 5 nodes. |
| Researcher Affiliation | Academia | Alessandro Panella and Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 |
| Pseudocode | Yes | Algorithm 1 Learn PDFC(ωi 1:T , M, R, S, Niter) |
| Open Source Code | No | The paper does not provide any concrete access to source code (no specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. It only mentions the implementation language: 'Implemented in MATLAB R'. |
| Open Datasets | Yes | We evaluate our approach on three problems of varying complexity. The first is the multiagent Tiger Problem introduced in (Gmytrasiewicz and Doshi 2005). The second problem is a variation of the 3x4 Maze problem described in (Russell and Norvig 2009). The third problem is a 5 5 instance of the UAUV reconnaissance problem described in (Zeng and Doshi 2012) |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, and test sets. It refers to 'observed history (Tlearn)' and 'training sequences' which are generated in the simulated environment, not pre-split datasets. |
| Hardware Specification | Yes | Implemented in MATLAB R and running on an Intel R Xeon R 2.27 GHz processor. |
| Software Dependencies | No | The paper only mentions 'MATLAB R' without a specific version number, and does not list any other key software components or libraries with their version numbers required for replication. |
| Experiment Setup | Yes | The following parameters were used for the MCMC sampler: M = 50, R = 50, S = 2. In each trial, the MCMC sampler was run for 5000 iterations, and the second halves of the generated sample chains were subsampled every 100 iterations, resulting in ensembles of 25 PDFCs per trial. The performance of the resulting I-POMDPs was then computed by averaging the total reward collected during 1000 runs of the POMCP algorithm, with discount factor 0.9 and using 210 simulations for exploring the search tree at each step; all other POMCP parameters were set as in (Silver and Veness 2010). |