Non-Stationary Approximate Modified Policy Iteration

Authors: Boris Lesner, Bruno Scherrer

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we describe an empirical illustration of the new algorithm NS-AMPI. Note that the goal here is not to convince the reader that the new degrees of freedom for approximate dynamic programming may be interesting in difficult real control problems – we leave this important question to future work – but rather to give some insight, on small and artificial well-controlled problems, on the effect of the main parameters m and ℓ. We evaluated the empirical performance gain of using nonstationary policies by implementing the algorithm using random error vectors ϵk, with each component being uniformly random between 0 and some user-supplied value ϵ. The adjustable size (with n) of the state and actions spaces allowed to compute an optimal policy to compare with the approximate ones generated by NS-AMPI for all combinations of parameters ℓ {1, 2, 5, 10} and m {1, 2, 5, 10, 25, }. Recall that the cases m = 1 and m = correspond respectively to the NS-VI and NS-PI, while the case ℓ= 1 corresponds to AMPI. We used n = 8 locations, γ = 0.98 and ϵ = 4 in all experiments. Figure 2 shows the average value of the error v vπk,ℓ per iteration for the different values of parameters m and ℓ. For each parameter combination, the results are obtained by averaging over 250 runs.
Researcher Affiliation Academia Inria, Villers-ls-Nancy, F-54600, France Universit de Lorraine, LORIA, UMR 7503, Vanduvre-ls-Nancy, F-54506, France
Pseudocode No The paper describes algorithmic schemes (AMPI, NS-AMPI) using mathematical notation and textual descriptions of steps, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets Yes The problem we consider, the dynamic location problem from Bertsekas & Yu (2012), involves a repairman moving between n sites according to some transition probabilities.
Dataset Splits No The paper mentions using a 'dynamic location problem' and performing '250 runs' for averaging results but does not specify training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup Yes We used n = 8 locations, γ = 0.98 and ϵ = 4 in all experiments.