Augmenting Markov Decision Processes with Advising
Authors: Loïs Vanhée, Laurent Jeanpierre, Abdel-Illah Mouaddib2531-2538
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper details the Advice-MDP formalism, a fast Advice MDP resolution algorithm, and its applicability for real-world tasks, via the design of a professional-class semi-autonomous robot system ready to be deployed in a wide range of unexpected environments and capable of efficiently integrating operator advising. Finally, this paper demonstrates the relevance of Advice MDPs for solving real-world problems, by deploying them for a professional-class application. Empirical Evaluation: We compared Advice-MDPs against Fully-Autonomous Systems (FAS, based on classic advice-less MDPs) and Non-Autonomous systems (i.e. teleoperation)... Experimental results (Table 1) detail the compromises between efficiency, flexibility, and OW costs. |
| Researcher Affiliation | Academia | Lo ıs Vanh ee, Laurent Jeanpierre, Abdel-Illah Mouaddib GREYC, Universit e de Caen, France Contact author: lois.vanhee@unicaen.fr. |
| Pseudocode | Yes | Algorithm 1: Fast Advice-MDP policy computer |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the described methodology. It only provides a link to 'Demonstration videos'. |
| Open Datasets | No | The paper describes using NERVA robots in custom scenarios ('corridor scenario', 'hole scenario') and generating maps via SLAM, rather than using a pre-existing, publicly available dataset with concrete access information (link, DOI, or formal citation). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits. It describes experimental scenarios but not data partitioning for model training or evaluation in a traditional sense. |
| Hardware Specification | No | The paper describes the NERVA robots used as the platform ('equipped with four cameras and a wide array of specific sensors'), but it does not specify the computing hardware (e.g., CPU, GPU, memory, or cloud instances) used to run the Advice-MDP algorithm or perform computations for the experiments. |
| Software Dependencies | No | The paper refers to concepts and existing theoretical frameworks (e.g., 'Markov Decision Processes', 'Ordered Weighted Regret', 'Simultaneous Localization and Mapping') but does not list specific software libraries, tools, or their version numbers that would be required to reproduce the experiments. |
| Experiment Setup | No | The paper describes the environment setup (e.g., '4096 4096 pixel map', '400 400tiles hexagonal grid') and general experimental procedures ('Each experiment was repeated 20 times'), but it does not provide specific algorithm parameters or hyperparameters (e.g., learning rates, batch sizes, or optimization settings) that constitute a detailed experimental setup for reproducibility. |