Globally Convergent Policy Search for Output Estimation
Authors: Jack Umenberger, Max Simchowitz, Juan Perdomo, Kaiqing Zhang, Russ Tedrake
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. ... The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material. |
| Researcher Affiliation | Collaboration | Jack Umenberger CSAIL, MIT umnbrgr@mit.edu Max Simchowitz CSAIL, MIT msimchow@mit.edu Juan C. Perdomo EECS, UC Berkeley jcperdomo@berkeley.edu Kaiqing Zhang CSAIL & LIDS, MIT kaiqing@mit.edu Russ Tedrake CSAIL, MIT russt@mit.edu ... M.S. is supported by Amazon.com Services LLC, PO# #D-06310236 and the MIT Quest for Intelligence. |
| Pseudocode | Yes | Algorithm 1 Informativity-regularized Policy Gradient (IR-PG) |
| Open Source Code | Yes | The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material. |
| Open Datasets | No | The paper describes generating synthetic data for experiments ("For all experiments, we generate an OE instance by sampling the matrices A, C, G and noise covariances W1, W2 from standard normal distributions, and scaling them appropriately.") rather than using a publicly available dataset with specific access information. |
| Dataset Splits | No | The paper mentions numerical experiments but does not provide specific details on training, validation, or test data splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All experiments were run on a server with an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz and 64GB of RAM. (Appendix F.6) |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | The stepsize schedule for vanilla policy gradient was selected by a grid search on 10 logarithmically spaced values in [10 3, 10 1]. For the regularized policy gradient, we used λ = 10 3. Our initialization was random: we sampled (AK, BK, CK) as i.i.d. standard normal variables and rescaled them (Appendix F.6). |