Planning and Learning with Stochastic Action Sets
Authors: Craig Boutilier, Alon Cohen, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov, Dale Schuurmans
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we offer a simple empirical demonstration of the importance of accounting for stochastic action availability when computing an MDP policy. Additional discussion and full proofs of all results can be found in a longer version of this paper [Boutilier et al., 2018]. |
| Researcher Affiliation | Industry | Craig Boutilier, Alon Cohen, Avinatan Hassidim, Yishay Mansour, Ofer Meshi, Martin Mladenov and Dale Schuurmans Google Research {cboutilier,aloncohen,avinatan,mansour,meshi,schuurmans}@google.com |
| Pseudocode | No | No clearly labeled pseudocode or algorithm blocks were found. Algorithms are described in paragraph form. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. It mentions open-source in the context of existing tools but not for their own implementation. |
| Open Datasets | No | The paper uses "a real-world road network (Fig. 1) in the San Francisco Bay Area" for its empirical illustration but does not provide access information (link, DOI, citation) for this specific dataset. |
| Dataset Splits | No | The empirical illustration describes a routing problem without specifying dataset splits (e.g., training, validation, test percentages or sample counts). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned. |
| Experiment Setup | Yes | The optimal policies for different choices p = 0.1, 0.2 and 0.4 are depicted in Fig. 1, where line thickness and color indicate traversal probabilities under the corresponding optimal policies. We see that lower values of p lead to policies with more redundancy (i.e., more alternate routes). |