An Experimental Study of Advice in Sequential Decision-Making Under Uncertainty
Authors: Florian Benavent, Bruno Zanuttini
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then report on an experimental study of the amount of advice needed for the agent to compute a good policy. Our study shows in particular that continual interaction between the user and the agent is worthwhile, and sheds light on the pros and cons of each type of advice. and Experimental Results We now report on experiments on synthetic MDPs, aimed at evaluating advice along the following dimensions: |
| Researcher Affiliation | Academia | Florian Benavent, Bruno Zanuttini Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France |
| Pseudocode | Yes | Figure 1: Master problem (top) and subproblem (bottom) for M = S, A, T, R, γ with R given by C r d. |
| Open Source Code | No | The paper does not provide any concrete access information (link, explicit statement) to the source code for the methodology described. |
| Open Datasets | No | We first ran experiments on generic MDPs, randomly generated using the same procedure as Regan and Boutilier (2009). Precisely, we generated random MDPs with 20 to 50 states and with 2 to 4 different actions available at each state. |
| Dataset Splits | No | The paper describes how the MDPs were randomly generated but does not specify train, validation, or test dataset splits (percentages or counts) or reference predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, cloud resources) used to conduct the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | Precisely, we generated random MDPs with 20 to 50 states and with 2 to 4 different actions available at each state. The transition function was generated by drawing, for each pair (s, a), log(|S| |A|) reachable states, and the probability of each was generated from a Gaussian. ... We ran 50 simulations with different settings, each one for 10 iterations in iterative scenarios. |