reproducibilityindex.ai

Differentiable MPC for End-to-end Planning and Control

Authors: Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, J. Zico Kolter

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present several results that highlight the performance and capabilities of differentiable MPC in comparison to neural network policies and vanilla system identiﬁcation (Sys Id). We show 1) superior runtime performance compared to an unrolled solver, 2) the ability of our method to recover the cost and dynamics of a controller with imitation, and 3) the beneﬁt of directly optimizing the task loss over vanilla Sys Id.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Georgia Tech 3Bosch Center for AI
Pseudocode	Yes	Module 1 Differentiable LQR (The LQR algorithm is deﬁned in Appendix A) Input: Initial state xinit Parameters: θ = {C, c, F, f} Forward Pass: 1: τ 1:T = LQRT (xinit; C, c, F, f) Solve (2) 2: Compute λ 1:T with (7) Backward Pass: 1: d τ1:T = LQRT (0; C, τ ℓ, F, 0) Solve (9), ideally reusing the factorizations from the forward pass 2: Compute d λ1:T with (7) 3: Compute the derivatives of ℓwith respect to C, c, F, f, and xinit with (8)
Open Source Code	Yes	We have released our differentiable MPC solver as a standalone open source package that is available at https://github.com/locuslab/mpc.pytorch and our experimental code for this paper is also openly available at https://github.com/locuslab/differentiable-mpc.
Open Datasets	No	We collected a dataset of trajectories from an expert controller and vary the number of trajectories our models are trained on. (The paper generates its own data from an expert controller, but does not provide access information for this generated dataset.)
Dataset Splits	No	More information about the training and validation losses are in Appendix B. (While validation loss is mentioned, the main text does not provide specific details on the dataset split for validation.)
Hardware Specification	No	A single trial of our experiments takes 1-2 hours on a modern CPU. (This is too vague and does not provide specific hardware details.)
Software Dependencies	No	Our experiments are implemented with Py Torch [Paszke et al., 2017]. (Only the software name "PyTorch" is mentioned without a specific version number.)
Experiment Setup	Yes	We do learning by differentiating L with respect to ˆθ (using mini-batches with 32 examples) and taking gradient steps with RMSprop [Tieleman and Hinton, 2012]. and We optimize the nn setting with Adam [Kingma and Ba, 2014] with a learning rate of 10 4 and all other settings are optimized with RMSprop [Tieleman and Hinton, 2012] with a learning rate of 10 2 and a decay term of 0.5. and simultaneously learning the weights wg and goal state τg is instable and in our experiments we alternate learning of wg and τg independently every 10 epochs.