reproducibilityindex.ai

Commitment Semantics for Sequential Decision Making under Reward Uncertainty

Authors: Qi Zhang, Edmund Durfee, Satinder Singh, Anna Chen, Stefan Witwicki

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now present a preliminary empirical evaluation comparing the runtime and solution quality of CCIMR to the MR and EBS algorithms on the following two sample problems. All algorithms were implemented and run on a 64-bit Windows machine with 1.8 GHz CPU and 4 GB RAM.Figure 2: Results on the Gate Control problem. Leftmost: The expected value of the robot under various commitment probabilities with = 4 and T = 5. Middle Left: Where = 5 and T = 5. Middle Right: The expected joint value of the robot and the user under various commitment probabilities. Rightmost: Computation time as the time horizon scales up.Table 1: Results on Committed Rock Sample.
Researcher Affiliation	Collaboration	Qi Zhang, Edmund Durfee, Satinder Singh University of Michigan {qizhg,durfee,baveja}@umich.edu Quora, Inc. anna1110@gmail.com Stefan Witwicki Nissan Research Center stefan.witwicki@nissan-usa.com
Pseudocode	Yes	CCIMR Algorithm Description. Our Commitment Constrained Iterative Mean Reward (CCIMR) algorithm, described below, updates the robot s policy according to its reward observations while still achieving the commitment. 1. Initialize/update reward belief. Use prior knowledge/standard POMDP Bayes rule to establish/update the probability distribution over reward functions as µt. 2. Update mean reward. If the belief changed, compute the mean reward as: Rµt(s, a) = µt(k)Rk(s, a) (6) 3. Update optimal constrained policy. If the mean reward changed, update current policy to t, ensuring t 2 \| t 1: t = arg max 2 \| t 1 V Rµt (st) (7) 4. Take stochastic action prescribed by the current policy, and loop until the time horizon is reached.
Open Source Code	No	The paper does not contain any statement about making code open source or providing a link to a code repository.
Open Datasets	No	The paper refers to the “Gate Control” problem and a “variant of the Rock Sample problem [Smith and Simmons, 2004]”. These are problem descriptions or simulation environments, not explicitly described as publicly available datasets with concrete access information or standard citations for datasets.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts for training, validation, or test sets).
Hardware Specification	Yes	All algorithms were implemented and run on a 64-bit Windows machine with 1.8 GHz CPU and 4 GB RAM.
Software Dependencies	No	The paper does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with specific version numbers.
Experiment Setup	Yes	The left two plots compare the robot s expected value of CCIMR to MR and EBS under different commitment probabilities when the time horizon T is 5 and the commitment time is 4 and 5, respectively. Given any arbitrary commitment, the expected value computed by CCIMR is indeed between those of EBS and MR, with substantial improvement over MR. Under a deterministic policy, even though the commitment probabilities of opening the gate are continuous, the achievable probabilities can only be discrete. Hence, the expected value of different commitment probabilities under a deterministic policy is always stepwise. Stochastic policies can do strictly better because they can achieve ﬁner tradeoffs between the commitment constraints and rewards, as shown in the middle left plot of Figure 2. To ﬁnd the best commitment that optimizes the expected joint value of the robot and the user, the set of feasible commitment probabilities is discretized with granularity = 0.02. We constrained the rover with T n commitments such that the probability of leaving the region grows linearly as the time approaches the horizon: = max(0, n+1 T n+1), 8 T ( = 0 when < n because it takes the rover at least n time steps to leave the region).