Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information
Authors: Genevieve Flaspohler, Nicholas A. Roy, John W. Fisher III
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In simulated tracking experiments, we achieve higher reward than both closed-loop and hand-coded macro-action baselines, selectively using Vo I macro-actions to reduce planning complexity while maintaining near-optimal task performance. |
| Researcher Affiliation | Academia | Genevieve Flaspohler1,2, Nicholas Roy1, and John W. Fisher III1 Massachusetts Intitute of Technology1 and the Woods Hole Oceanographic Institution2 {geflaspo, nickroy, fisher}@csail.mit.edu |
| Pseudocode | Yes | We modify the standard value iteration backup operation to compute the Vo I, adding open-loop backups whenever the Vo I is low. An algorithm summary is presented in the supplement. ... An algorithm summary for macro-action chaining is presented in the supplement. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes a simulated tracking problem in a '10x10 discretized map' and mentions 'simulated tracking experiments', but it does not refer to a named, publicly available dataset with concrete access information (e.g., link, DOI, formal citation). |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, and test dataset splits with percentages or sample counts, nor does it reference predefined splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper states 'value function approximation is performed using a custom implementation of PBVI [15]', but does not provide specific version numbers for software libraries or dependencies. PBVI is an algorithm, not a specific software with a version. |
| Experiment Setup | Yes | We assume discrete states, actions and observations and represent the value function by a piecewise-linear and convex (PWLC) collection of α-vectors [9]. An adaptation of the algorithm presented in Section 4 to a PWLC value function is contained in the supplement (Section C). We demonstrate macro-action generation in a dynamic tracking problem (Fig. 4), in which a fully observable, actuated agent tracks a partially observable target moving in a known 10 10 discretized map (|S| = 10, 000). A full description of the experimental domain and parameterization is in the supplement (Section D). ... The Vo I macro-action policy (Vo I MA) is compared against an approximation to the closed-loop optimal policy (base closed-loop, Base CL) and a fixed length macro-action (Fixed MA) policy, which is constrained to act closed-loop only every T = 15 planning iterations. ... We additionally explore the effect of the Vo I threshold τ on planner performance, macro-action utilization, and the value of δB. |