Approximate Policy Iteration Schemes: A Comparison

Authors: Bruno Scherrer

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations with these schemes confirm our analysis.Section 4 will go on by providing experiments that will illustrate their behavior, and confirm our analysis.
Researcher Affiliation Academia Bruno Scherrer BRUNO.SCHERRER@INRIA.FR Inria, Villers-l es-Nancy, F-54600, France Universit e de Lorraine, LORIA, UMR 7503, Vandœuvre-l es-Nancy, F-54506, France
Pseudocode No The paper describes algorithms using equations and textual explanations, but no formal pseudocode or algorithm blocks are provided.
Open Source Code No The paper does not provide any statements about releasing code or links to source code repositories for the methodology described.
Open Datasets Yes More precisely, we consider Garnet problems first introduced by Archibald et al. (1995), which are a class of randomly constructed finite MDPs.
Dataset Splits No The paper mentions generating Garnet MDPs and running algorithms, but does not specify train, validation, or test splits for data, nor does it describe a cross-validation setup.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments or simulations.
Software Dependencies No The paper does not provide specific software dependencies or version numbers.
Experiment Setup Yes The greedy step used by all algorithms is approximated by an exact greedy operator applied to a noisy orthogonal projection on a linear space of dimension |S| / 10 with respect to the quadratic norm weighted by ν or dν,π (for CPI+ and CPI(α)) where ν is uniform. For each run j and algorithm, we compute for all iterations k (1, 100) the performance... We considered the standard API as a baseline. ... we considered two variations: CPI+ that is identical to CPI except that it chooses the step αk at each iteration by doing a line-search... and CPI(α) with α = 0.1... we also considered API(α) with α = 0.1... In addition to these algorithms, we considered PSDP and NSPI(m) for the values m {5, 10, 30}.