Non-Stationary Approximate Modified Policy Iteration
Authors: Boris Lesner, Bruno Scherrer
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we describe an empirical illustration of the new algorithm NS-AMPI. Note that the goal here is not to convince the reader that the new degrees of freedom for approximate dynamic programming may be interesting in difficult real control problems – we leave this important question to future work – but rather to give some insight, on small and artificial well-controlled problems, on the effect of the main parameters m and ℓ. We evaluated the empirical performance gain of using nonstationary policies by implementing the algorithm using random error vectors ϵk, with each component being uniformly random between 0 and some user-supplied value ϵ. The adjustable size (with n) of the state and actions spaces allowed to compute an optimal policy to compare with the approximate ones generated by NS-AMPI for all combinations of parameters ℓ {1, 2, 5, 10} and m {1, 2, 5, 10, 25, }. Recall that the cases m = 1 and m = correspond respectively to the NS-VI and NS-PI, while the case ℓ= 1 corresponds to AMPI. We used n = 8 locations, γ = 0.98 and ϵ = 4 in all experiments. Figure 2 shows the average value of the error v vπk,ℓ per iteration for the different values of parameters m and ℓ. For each parameter combination, the results are obtained by averaging over 250 runs. |
| Researcher Affiliation | Academia | Inria, Villers-ls-Nancy, F-54600, France Universit de Lorraine, LORIA, UMR 7503, Vanduvre-ls-Nancy, F-54506, France |
| Pseudocode | No | The paper describes algorithmic schemes (AMPI, NS-AMPI) using mathematical notation and textual descriptions of steps, but it does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | The problem we consider, the dynamic location problem from Bertsekas & Yu (2012), involves a repairman moving between n sites according to some transition probabilities. |
| Dataset Splits | No | The paper mentions using a 'dynamic location problem' and performing '250 runs' for averaging results but does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment. |
| Experiment Setup | Yes | We used n = 8 locations, γ = 0.98 and ϵ = 4 in all experiments. |