Robust partially observable Markov decision process
Authors: Takayuki Osogami
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments show that our point-based value iteration can adequately find robust policies. |
| Researcher Affiliation | Industry | Takayuki Osogami OSOGAMI@JP.IBM.COM IBM Research Tokyo, Tokyo, Japan |
| Pseudocode | Yes | Algorithm 1 Robust value iteration; Algorithm 2 Robust DP backup; Algorithm 3 Robust point-based DP backup |
| Open Source Code | No | The paper does not include any statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper mentions using "Heaven and Hell, a standard instance of a POMDP" in its numerical experiments. However, it does not provide concrete access information (like a specific link, DOI, repository name, or formal citation with authors/year) for this or any other dataset or environment used. |
| Dataset Splits | No | The paper describes a reinforcement learning/planning problem setup and numerical experiments, but it does not specify any training, validation, or test dataset splits in terms of percentages, sample counts, or predefined external splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the numerical experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, specific libraries, or solvers). |
| Experiment Setup | Yes | The agent moves one step at a time with the reward of 1 (unit cost). The agent obtains the reward of 1 upon reaching heaven or the reward of 10 upon reaching hell, and terminates the travel. The agent seeks to maximize the expected cumulative reward with the discount rate of γ = 0.9. When pe is large, the agent should directly go to an arbitrary ? , because the cost of going to ! for an observation pays off only when the observation is informative. A difficulty here is that pe is uncertain. |