reproducibilityindex.ai

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Authors: Stephen Tu, Benjamin Recht

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct numerical experiments on LSTD for value function estimation, and Least-Squares Policy Iteration (LSPI) for an end-to-end comparison with the model-based methods in Dean et al. (2017). Our implementation is carried out in Python using numpy for linear algebraic computations and PyWren (Jonas et al., 2017) for parallelization. In our ﬁrst set of experiments, we construct synthetic examples where we vary the condition number of the resulting closed-loop controllability gramian matrix. We ﬁnd that on these instances, as the condition number increases, the required number of samples to estimate the value function to ﬁxed relative error increases, as predicted by our result in Theorem 4.3. In our second set of experiments, we compare model-free policy iteration (LSPI) to two model-based methods: (a) the na ıve nominal model controller which uses a controller designed assuming that the nominal model has zero error, and (b) a controller based on a semideﬁnite relaxation to the non-convex robust control problem with static state-feedback. Our experiments show that model-free policy iteration requires more samples than model-based methods for the instances we consider.
Researcher Affiliation	Academia	Stephen Tu 1 Benjamin Recht 1, 1EECS Department, University of California, Berkeley. Correspondence to: Stephen Tu <stephent@berkeley.edu>.
Pseudocode	No	The paper describes algorithms and derivations in prose and mathematical notation but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'Our implementation is carried out in Python using numpy for linear algebraic computations and PyWren (Jonas et al., 2017) for parallelization.' but does not provide any statement or link about the availability of their source code.
Open Datasets	No	The paper uses 'synthetic examples' generated by specific parameters and processes (e.g., 'We collect M independent trajectories of the system (5.1) excited by independent Gaussian noise N(0, I3) of length N = 20.'), rather than a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes using prefixes of generated trajectories for evaluation ('For each trajectory, we take the ﬁrst Np points for Np {100, 200, ..., 1000} and compute the LSTD estimator b PNp on the ﬁrst Np data points.') but does not specify a train/validation/test split or cross-validation.
Hardware Specification	No	The paper mentions 'Our implementation is carried out in Python using numpy for linear algebraic computations and PyWren (Jonas et al., 2017) for parallelization.' but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies	Yes	The paper mentions 'We solve the resulting SDPs using cvxpy (Diamond & Boyd, 2016) with MOSEK (2015).' and the reference for MOSEK specifies 'Version 7.1 (Revision 28).'
Experiment Setup	Yes	We consider several instances of LQR with n = 5, Q = R = 0.1I5, and γ = 0.9. For each conﬁguration, we collect 100 trajectories of length N = 1000. For the purposes of comparison, we set K0 such that the closed loop matrix A + BK0 = diag(0.6, 0.6, 0.6).