Multi-Objective MDPs with Conditional Lexicographic Reward Preferences
Authors: Kyle Wray, Shlomo Zilberstein, Abdel-Illah Mouaddib
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The performance of LVI in practice is tested within a realistic benchmark problem in the domain of semi-autonomous driving. Section 4 presents our experimental results. We evaluated the computation times of both weighted VI, as well as LVI run on both a CPU and GPU. |
| Researcher Affiliation | Academia | 1 School of Computer Science, University of Massachusetts, Amherst, MA, USA 2 GREYC Laboratory, University of Caen, Basse-Normandie, France |
| Pseudocode | Yes | Algorithm 1 Lexicographic Value Iteration (LVI) |
| Open Source Code | No | The paper states, "In future work, we plan to enrich the semi-autonomous driving domain and make the scenario creation tools public to facilitate further research and comparison of algorithms for multi-objective optimization." This indicates a future plan for releasing tools, not current availability of the methodology's source code. |
| Open Datasets | Yes | We use real-world road data from Open Street Map1 for sections of the 10 cities in Table 1. 1http://wiki.openstreetmap.org |
| Dataset Splits | No | The paper does not specify traditional training/validation/test dataset splits. While it uses real-world road data to define the MDP environment, the evaluation is based on the algorithm's performance within these environments, not on separate data partitions for training, validation, or testing in the machine learning sense. |
| Hardware Specification | Yes | Our experiments shown in Table 1 were executed with an Intel(R) Core(TM) i7-4702HQ CPU at 2.20GHz, 8GB of RAM, and an Nvidia(R) Ge Force GTX 870M graphics card using C++ and CUDA(C) 6.5. |
| Software Dependencies | Yes | Our experiments shown in Table 1 were executed with an Intel(R) Core(TM) i7-4702HQ CPU at 2.20GHz, 8GB of RAM, and an Nvidia(R) Ge Force GTX 870M graphics card using C++ and CUDA(C) 6.5. |
| Experiment Setup | Yes | LMDP states are formed by a pair of intersections (previous and current), driver tiredness (true/false), and autonomy (enabled/disabled). Actions are taken at intersections... The stochasticity within state transitions model the likelihood that the driver will drift from attentive to tired, following probability of 0.1. The time cost is proportional to the time spent on the road (in seconds), plus a small constant value of 5 (seconds)... We allow for a 10 second slack in the expected time to reach the goal, in order to favor selecting autonomy-capable roads. |