An Experimental Study of Advice in Sequential Decision-Making Under Uncertainty

Authors: Florian Benavent, Bruno Zanuttini

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then report on an experimental study of the amount of advice needed for the agent to compute a good policy. Our study shows in particular that continual interaction between the user and the agent is worthwhile, and sheds light on the pros and cons of each type of advice. and Experimental Results We now report on experiments on synthetic MDPs, aimed at evaluating advice along the following dimensions:
Researcher Affiliation Academia Florian Benavent, Bruno Zanuttini Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
Pseudocode Yes Figure 1: Master problem (top) and subproblem (bottom) for M = S, A, T, R, γ with R given by C r d.
Open Source Code No The paper does not provide any concrete access information (link, explicit statement) to the source code for the methodology described.
Open Datasets No We first ran experiments on generic MDPs, randomly generated using the same procedure as Regan and Boutilier (2009). Precisely, we generated random MDPs with 20 to 50 states and with 2 to 4 different actions available at each state.
Dataset Splits No The paper describes how the MDPs were randomly generated but does not specify train, validation, or test dataset splits (percentages or counts) or reference predefined splits for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, cloud resources) used to conduct the experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup Yes Precisely, we generated random MDPs with 20 to 50 states and with 2 to 4 different actions available at each state. The transition function was generated by drawing, for each pair (s, a), log(|S| |A|) reachable states, and the probability of each was generated from a Gaussian. ... We ran 50 simulations with different settings, each one for 10 iterations in iterative scenarios.