reproducibilityindex.ai

Globally Convergent Policy Search for Output Estimation

Authors: Jack Umenberger, Max Simchowitz, Juan Perdomo, Kaiqing Zhang, Russ Tedrake

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce the ﬁrst direct policy search algorithm which provably converges to the globally optimal dynamic ﬁlter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. ... The efﬁcacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material.
Researcher Affiliation	Collaboration	Jack Umenberger CSAIL, MIT umnbrgr@mit.edu Max Simchowitz CSAIL, MIT msimchow@mit.edu Juan C. Perdomo EECS, UC Berkeley jcperdomo@berkeley.edu Kaiqing Zhang CSAIL & LIDS, MIT kaiqing@mit.edu Russ Tedrake CSAIL, MIT russt@mit.edu ... M.S. is supported by Amazon.com Services LLC, PO# #D-06310236 and the MIT Quest for Intelligence.
Pseudocode	Yes	Algorithm 1 Informativity-regularized Policy Gradient (IR-PG)
Open Source Code	Yes	The efﬁcacy of the approach is illustrated via numerical experiments, cf. Appendix F.6. Code is provided in the supplementary material.
Open Datasets	No	The paper describes generating synthetic data for experiments ("For all experiments, we generate an OE instance by sampling the matrices A, C, G and noise covariances W1, W2 from standard normal distributions, and scaling them appropriately.") rather than using a publicly available dataset with specific access information.
Dataset Splits	No	The paper mentions numerical experiments but does not provide specific details on training, validation, or test data splits (e.g., percentages or sample counts).
Hardware Specification	Yes	All experiments were run on a server with an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz and 64GB of RAM. (Appendix F.6)
Software Dependencies	No	The paper mentions
Experiment Setup	Yes	The stepsize schedule for vanilla policy gradient was selected by a grid search on 10 logarithmically spaced values in [10 3, 10 1]. For the regularized policy gradient, we used λ = 10 3. Our initialization was random: we sampled (AK, BK, CK) as i.i.d. standard normal variables and rescaled them (Appendix F.6).