Approximate Policy Iteration Schemes: A Comparison
Authors: Bruno Scherrer
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulations with these schemes confirm our analysis.Section 4 will go on by providing experiments that will illustrate their behavior, and confirm our analysis. |
| Researcher Affiliation | Academia | Bruno Scherrer BRUNO.SCHERRER@INRIA.FR Inria, Villers-l es-Nancy, F-54600, France Universit e de Lorraine, LORIA, UMR 7503, Vandœuvre-l es-Nancy, F-54506, France |
| Pseudocode | No | The paper describes algorithms using equations and textual explanations, but no formal pseudocode or algorithm blocks are provided. |
| Open Source Code | No | The paper does not provide any statements about releasing code or links to source code repositories for the methodology described. |
| Open Datasets | Yes | More precisely, we consider Garnet problems first introduced by Archibald et al. (1995), which are a class of randomly constructed finite MDPs. |
| Dataset Splits | No | The paper mentions generating Garnet MDPs and running algorithms, but does not specify train, validation, or test splits for data, nor does it describe a cross-validation setup. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments or simulations. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers. |
| Experiment Setup | Yes | The greedy step used by all algorithms is approximated by an exact greedy operator applied to a noisy orthogonal projection on a linear space of dimension |S| / 10 with respect to the quadratic norm weighted by ν or dν,π (for CPI+ and CPI(α)) where ν is uniform. For each run j and algorithm, we compute for all iterations k (1, 100) the performance... We considered the standard API as a baseline. ... we considered two variations: CPI+ that is identical to CPI except that it chooses the step αk at each iteration by doing a line-search... and CPI(α) with α = 0.1... we also considered API(α) with α = 0.1... In addition to these algorithms, we considered PSDP and NSPI(m) for the values m {5, 10, 30}. |