Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Authors: Timothy Mann, Shie Mannor

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice.
Researcher Affiliation Academia Timothy A. Mann MANN@EE.TECHNION.AC.IL Shie Mannor SHIE@EE.TECHNION.AC.IL Department of Electrical Engineering, The Technion Israel Institute of Technology, Haifa, Israel 32000
Pseudocode No The paper describes algorithms using mathematical formulations and textual descriptions (e.g., equations (2), (3), (7)) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing code, links to repositories, or indications that source code for the described methodology is publicly available.
Open Datasets Yes We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008)) where β is the inverse of the mean of an exponential distribution driving the transition dynamics of the task. ... The details of the task and exact parameters used in our experiments are described in the supplementary material.
Dataset Splits No Cross-validation was used to select grid density and basis widths. However, specific details about train/validation/test splits (percentages, sample counts) are not provided in the paper's main text.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper describes the approximation methods used (e.g., 'polynomials to approximate the value function', 'linear approximations with a fixed grid of one-dimensional radial basis functions') but does not list any specific software libraries with version numbers.
Experiment Setup Yes We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008))... All results presented here used fourth degree polynomials. For the OFVI condition, we introduced a single option that keeps the product up to a point x = x + and terminates once the state equals or exceeds x.