Belief-State Query Policies for User-Aligned POMDPs
Authors: Daniel Bramblett, Siddharth Srivastava
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on a diverse set of problems showing both the efficiency of our algorithm and the quality of the computed user-aligned policies. (Sec. 7). |
| Researcher Affiliation | Academia | Daniel Bramblett and Siddharth Srivastava Autonomous Agents and Intelligent Robots Lab School of Computing and Augmented Intelligence Arizona State University, AZ, USA {drbrambl,siddharths}@asu.edu |
| Pseudocode | Yes | Algorithm 1 Partition Refinement Search (PRS) |
| Open Source Code | Yes | Complete source code is available in the supplementary material. |
| Open Datasets | No | The paper defines problems like 'Lane merger', 'Spaceship repair', 'Graph rock sample', and 'Store visit', which appear to be simulation environments rather than public datasets with specified access information. |
| Dataset Splits | No | The paper discusses evaluating policies but does not specify training, validation, and test dataset splits in the context of data partitioning. |
| Hardware Specification | Yes | All experiments were performed on an Intel(R) Xeon(R) W-2102 CPU @ 2.90GHz without using a GPU. |
| Software Dependencies | No | The paper describes the implementation using a manager-worker design pattern but does not specify versions for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | The manager maintained the hypothesized optimal partition and current exploration rate. Table 3 shows the timeout and sample rate used for each problem for PRS... For Nelder-Mead optimization, we used a simplex that had vertices numbering one more than the number of parameters... For Particle Swarm optimization, 10 particles were used with the location and momentum of each particle clipped to the search space. The coefficients changed based on steps since the last improvement. |