Augmenting Markov Decision Processes with Advising

Authors: Loïs Vanhée, Laurent Jeanpierre, Abdel-Illah Mouaddib2531-2538

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper details the Advice-MDP formalism, a fast Advice MDP resolution algorithm, and its applicability for real-world tasks, via the design of a professional-class semi-autonomous robot system ready to be deployed in a wide range of unexpected environments and capable of efficiently integrating operator advising. Finally, this paper demonstrates the relevance of Advice MDPs for solving real-world problems, by deploying them for a professional-class application. Empirical Evaluation: We compared Advice-MDPs against Fully-Autonomous Systems (FAS, based on classic advice-less MDPs) and Non-Autonomous systems (i.e. teleoperation)... Experimental results (Table 1) detail the compromises between efficiency, flexibility, and OW costs.
Researcher Affiliation Academia Lo ıs Vanh ee, Laurent Jeanpierre, Abdel-Illah Mouaddib GREYC, Universit e de Caen, France Contact author: lois.vanhee@unicaen.fr.
Pseudocode Yes Algorithm 1: Fast Advice-MDP policy computer
Open Source Code No The paper does not provide an explicit statement or link to the source code for the described methodology. It only provides a link to 'Demonstration videos'.
Open Datasets No The paper describes using NERVA robots in custom scenarios ('corridor scenario', 'hole scenario') and generating maps via SLAM, rather than using a pre-existing, publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits. It describes experimental scenarios but not data partitioning for model training or evaluation in a traditional sense.
Hardware Specification No The paper describes the NERVA robots used as the platform ('equipped with four cameras and a wide array of specific sensors'), but it does not specify the computing hardware (e.g., CPU, GPU, memory, or cloud instances) used to run the Advice-MDP algorithm or perform computations for the experiments.
Software Dependencies No The paper refers to concepts and existing theoretical frameworks (e.g., 'Markov Decision Processes', 'Ordered Weighted Regret', 'Simultaneous Localization and Mapping') but does not list specific software libraries, tools, or their version numbers that would be required to reproduce the experiments.
Experiment Setup No The paper describes the environment setup (e.g., '4096 4096 pixel map', '400 400tiles hexagonal grid') and general experimental procedures ('Each experiment was repeated 20 times'), but it does not provide specific algorithm parameters or hyperparameters (e.g., learning rates, batch sizes, or optimization settings) that constitute a detailed experimental setup for reproducibility.