Globally Convergent Policy Search for Output Estimation

Authors: Jack Umenberger, Max Simchowitz, Juan Perdomo, Kaiqing Zhang, Russ Tedrake

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. ... The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material.
Researcher Affiliation Collaboration Jack Umenberger CSAIL, MIT umnbrgr@mit.edu Max Simchowitz CSAIL, MIT msimchow@mit.edu Juan C. Perdomo EECS, UC Berkeley jcperdomo@berkeley.edu Kaiqing Zhang CSAIL & LIDS, MIT kaiqing@mit.edu Russ Tedrake CSAIL, MIT russt@mit.edu ... M.S. is supported by Amazon.com Services LLC, PO# #D-06310236 and the MIT Quest for Intelligence.
Pseudocode Yes Algorithm 1 Informativity-regularized Policy Gradient (IR-PG)
Open Source Code Yes The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material.
Open Datasets No The paper describes generating synthetic data for experiments ("For all experiments, we generate an OE instance by sampling the matrices A, C, G and noise covariances W1, W2 from standard normal distributions, and scaling them appropriately.") rather than using a publicly available dataset with specific access information.
Dataset Splits No The paper mentions numerical experiments but does not provide specific details on training, validation, or test data splits (e.g., percentages or sample counts).
Hardware Specification Yes All experiments were run on a server with an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz and 64GB of RAM. (Appendix F.6)
Software Dependencies No The paper mentions
Experiment Setup Yes The stepsize schedule for vanilla policy gradient was selected by a grid search on 10 logarithmically spaced values in [10 3, 10 1]. For the regularized policy gradient, we used λ = 10 3. Our initialization was random: we sampled (AK, BK, CK) as i.i.d. standard normal variables and rescaled them (Appendix F.6).