reproducibilityindex.ai

Scaling Up Approximate Value Iteration with Options: Better Policies with Fewer Iterations

Authors: Timothy Mann, Shie Mannor

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice.
Researcher Affiliation	Academia	Timothy A. Mann MANN@EE.TECHNION.AC.IL Shie Mannor SHIE@EE.TECHNION.AC.IL Department of Electrical Engineering, The Technion Israel Institute of Technology, Haifa, Israel 32000
Pseudocode	No	The paper describes algorithms using mathematical formulations and textual descriptions (e.g., equations (2), (3), (7)) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing code, links to repositories, or indications that source code for the described methodology is publicly available.
Open Datasets	Yes	We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008)) where β is the inverse of the mean of an exponential distribution driving the transition dynamics of the task. ... The details of the task and exact parameters used in our experiments are described in the supplementary material.
Dataset Splits	No	Cross-validation was used to select grid density and basis widths. However, specific details about train/validation/test splits (percentages, sample counts) are not provided in the paper's main text.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper describes the approximation methods used (e.g., 'polynomials to approximate the value function', 'linear approximations with a ﬁxed grid of one-dimensional radial basis functions') but does not list any specific software libraries with version numbers.
Experiment Setup	Yes	We used parameter values γ = 0.6, β = 0.5, C = 30 and c(x) = 4x (identical to those used by Munos & Szepesv ari (2008))... All results presented here used fourth degree polynomials. For the OFVI condition, we introduced a single option that keeps the product up to a point x = x + and terminates once the state equals or exceeds x.