reproducibilityindex.ai

Limiting Extrapolation in Linear Approximate Value Iteration

Authors: Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our simulations we show that small levels of ampliﬁcation can be achieved, and that our algorithm can effectively mitigate the divergence observed in some simple MDPs for least-squares AVI. This happens even when using identical feature representations, highlighting the beneﬁt of bounding extrapolation through constructing feature representations as near convex combinations (versus 2 or other common loss functions). Furthermore, we empirically show that small ampliﬁcation factors can be obtained with relatively small sets of anchor points. 5 Numerical Simulations We investigate the potential beneﬁt of LAVIER over least-squares AVI (LS-AVI). [...] The empirical results are obtained by averaging 100 simulations and they are reported with 95%-conﬁdence intervals.
Researcher Affiliation	Collaboration	Andrea Zanette Institute for Computational and Mathematical Engineering, Stanford University, CA zanette@stanford.edu Alessandro Lazaric Facebook AI Research lazaric@fb.com Mykel J. Kochenderfer Department of Aeronautics and Astronautics, Stanford University, CA mykel@stanford.edu Emma Brunskill Department of Computer Science, Stanford University, CA ebrun@cs.stanford.edu
Pseudocode	Yes	Algorithm 1 LAVIER algorithm.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	No	The paper describes different MDP scenarios for its simulations (“Two-state MDP of Tsitsiklis and Van Roy”, “Chain MDP”, “Successive Linear Bandits”) but does not provide any links, DOIs, or formal citations for public datasets used in the experiments. These appear to be custom-defined simulation environments.
Dataset Splits	No	The paper mentions generating samples and running simulations (e.g., “1000 samples at each timestep”, “The samples are generated uniformly from the left and middle node”), but it does not specify explicit train/validation/test dataset splits for reproduction.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory specifications) used to conduct the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup	Yes	For simplicity, we set the parameter = 0.01, and add a zero-mean noise to all rewards generated as 1/2 Ber(1/2), where Ber( ) is a Bernoulli random variable. [...] The length of the chain is N = 50, which is also the time horizon. [...] At each state s1, . . . , s N, we represent actions in R2 and we generate 100 actions by uniformly discretizing the circumference. [...] The anchor points for LAVIER are chosen by our adaptive procedure for different value of the extrapolation coefﬁcient C 2 {1.05, 1.2, 1.5}.