Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations
Authors: Timothy Mann, Shie Mannor
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. |
| Researcher Affiliation | Academia | Timothy A. Mann MANN@EE.TECHNION.AC.IL Shie Mannor SHIE@EE.TECHNION.AC.IL Department of Electrical Engineering, The Technion Israel Institute of Technology, Haifa, Israel 32000 |
| Pseudocode | No | The paper describes algorithms using mathematical formulations and textual descriptions (e.g., equations (2), (3), (7)) but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statements about releasing code, links to repositories, or indications that source code for the described methodology is publicly available. |
| Open Datasets | Yes | We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008)) where β is the inverse of the mean of an exponential distribution driving the transition dynamics of the task. ... The details of the task and exact parameters used in our experiments are described in the supplementary material. |
| Dataset Splits | No | Cross-validation was used to select grid density and basis widths. However, specific details about train/validation/test splits (percentages, sample counts) are not provided in the paper's main text. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper describes the approximation methods used (e.g., 'polynomials to approximate the value function', 'linear approximations with a fixed grid of one-dimensional radial basis functions') but does not list any specific software libraries with version numbers. |
| Experiment Setup | Yes | We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008))... All results presented here used fourth degree polynomials. For the OFVI condition, we introduced a single option that keeps the product up to a point x = x + and terminates once the state equals or exceeds x. |