reproducibilityindex.ai

Approximate Policy Iteration Schemes: A Comparison

Authors: Bruno Scherrer

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations with these schemes conﬁrm our analysis.Section 4 will go on by providing experiments that will illustrate their behavior, and conﬁrm our analysis.
Researcher Affiliation	Academia	Bruno Scherrer BRUNO.SCHERRER@INRIA.FR Inria, Villers-l es-Nancy, F-54600, France Universit e de Lorraine, LORIA, UMR 7503, Vandœuvre-l es-Nancy, F-54506, France
Pseudocode	No	The paper describes algorithms using equations and textual explanations, but no formal pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper does not provide any statements about releasing code or links to source code repositories for the methodology described.
Open Datasets	Yes	More precisely, we consider Garnet problems ﬁrst introduced by Archibald et al. (1995), which are a class of randomly constructed ﬁnite MDPs.
Dataset Splits	No	The paper mentions generating Garnet MDPs and running algorithms, but does not specify train, validation, or test splits for data, nor does it describe a cross-validation setup.
Hardware Specification	No	The paper does not specify any particular hardware used for running the experiments or simulations.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers.
Experiment Setup	Yes	The greedy step used by all algorithms is approximated by an exact greedy operator applied to a noisy orthogonal projection on a linear space of dimension \|S\| / 10 with respect to the quadratic norm weighted by ν or dν,π (for CPI+ and CPI(α)) where ν is uniform. For each run j and algorithm, we compute for all iterations k (1, 100) the performance... We considered the standard API as a baseline. ... we considered two variations: CPI+ that is identical to CPI except that it chooses the step αk at each iteration by doing a line-search... and CPI(α) with α = 0.1... we also considered API(α) with α = 0.1... In addition to these algorithms, we considered PSDP and NSPI(m) for the values m {5, 10, 30}.