Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe

Authors: Sanghack Lee, Elias Bareinboim

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we investigate several properties of the class of mixed policies and provide an efficient and effective characterization, including optimality and non-redundancy. Specifically, we introduce a graphical criterion to identify unnecessary contexts for a set of actions, leading to a natural characterization of non-redundancy of mixed policies. We then derive sufficient conditions under which one strategy can dominate the other with respect to their maximum achievable expected rewards (optimality). This characterization leads to a fundamental understanding of the space of mixed policies and a possible refinement of the agent s strategy so that it converges to the optimum faster and more robustly.
Researcher Affiliation Academia Sanghack Lee Elias Bareinboim Causal Artificial Intelligence Laboratory Columbia University {sanghacklee,eb}@cs.columbia.edu
Pseudocode Yes We provide an efficient algorithm for obtaining a unique, maximal, non-redundant MPS (nr-mps, Alg. 2) of a given MPS in Appendix E [28].
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not mention using any specific datasets for training or provide information about their availability.
Dataset Splits No The paper is theoretical and does not describe any experimental setups involving training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any computational experiments or the hardware used to run them.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers that would be needed to replicate experimental results.
Experiment Setup No The paper is theoretical and does not describe any experimental setups, hyperparameters, or training configurations.