Policies that Generalize: Solving Many Planning Problems with the Same Policy

Authors: Blai Bonet, Hector Geffner

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish conditions under which memoryless policies and finite-state controllers that solve one partially observable non-deterministic problem (PONDP) generalize to other problems; namely, problems that have a similar structure and share the same action and observation space. Our aim in this paper is to provide a characterization of the common structure that allows for policy generalization. We use a logical setting where uncertainty is represented by sets of states and the goal is to be achieved with certainty. In this setting, the notions of solution policy and generalization become very crisp: a policy solves a problem or not, and it generalizes to another problem or not. We thus ignore considerations of costs, quality or rewards, and do not consider probabilities explicitly. Our results, however, do apply to the probabilistic setting where goals are to be achieved with probability 1.
Researcher Affiliation Academia Blai Bonet Universidad Sim on Bol ıvar Caracas, Venezuela bonet@ldc.usb.ve Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, SPAIN hector.geffner@upf.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link regarding the availability of open-source code for the methodology described.
Open Datasets No The paper uses conceptual examples to illustrate theoretical points, such as 'Countdown' and 'Dust-Cleaning Robot', but it does not mention or use any publicly available datasets in the context of experiments or training.
Dataset Splits No The paper does not describe empirical experiments, and therefore does not specify training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any empirical experiments, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe any empirical experiments, thus no software dependencies with version numbers are provided.
Experiment Setup No The paper is theoretical and focuses on defining models and proving theorems, therefore it does not provide details about an experimental setup, hyperparameters, or training configurations.