Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Globally Convergent Policy Search for Output Estimation
Authors: Jack Umenberger, Max Simchowitz, Juan Perdomo, Kaiqing Zhang, Russ Tedrake
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce the first direct policy search algorithm which provably converges to the globally optimal dynamic filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. ... The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material. |
| Researcher Affiliation | Collaboration | Jack Umenberger CSAIL, MIT EMAIL Max Simchowitz CSAIL, MIT EMAIL Juan C. Perdomo EECS, UC Berkeley EMAIL Kaiqing Zhang CSAIL & LIDS, MIT EMAIL Russ Tedrake CSAIL, MIT EMAIL ... M.S. is supported by Amazon.com Services LLC, PO# #D-06310236 and the MIT Quest for Intelligence. |
| Pseudocode | Yes | Algorithm 1 Informativity-regularized Policy Gradient (IR-PG) |
| Open Source Code | Yes | The efficacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material. |
| Open Datasets | No | The paper describes generating synthetic data for experiments ("For all experiments, we generate an OE instance by sampling the matrices A, C, G and noise covariances W1, W2 from standard normal distributions, and scaling them appropriately.") rather than using a publicly available dataset with specific access information. |
| Dataset Splits | No | The paper mentions numerical experiments but does not provide specific details on training, validation, or test data splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All experiments were run on a server with an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz and 64GB of RAM. (Appendix F.6) |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | The stepsize schedule for vanilla policy gradient was selected by a grid search on 10 logarithmically spaced values in [10 3, 10 1]. For the regularized policy gradient, we used λ = 10 3. Our initialization was random: we sampled (AK, BK, CK) as i.i.d. standard normal variables and rescaled them (Appendix F.6). |