Differentiable MPC for End-to-end Planning and Control
Authors: Brandon Amos, Ivan Jimenez, Jacob Sacks, Byron Boots, J. Zico Kolter
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present several results that highlight the performance and capabilities of differentiable MPC in comparison to neural network policies and vanilla system identification (Sys Id). We show 1) superior runtime performance compared to an unrolled solver, 2) the ability of our method to recover the cost and dynamics of a controller with imitation, and 3) the benefit of directly optimizing the task loss over vanilla Sys Id. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University 2Georgia Tech 3Bosch Center for AI |
| Pseudocode | Yes | Module 1 Differentiable LQR (The LQR algorithm is defined in Appendix A) Input: Initial state xinit Parameters: θ = {C, c, F, f} Forward Pass: 1: τ 1:T = LQRT (xinit; C, c, F, f) Solve (2) 2: Compute λ 1:T with (7) Backward Pass: 1: d τ1:T = LQRT (0; C, τ ℓ, F, 0) Solve (9), ideally reusing the factorizations from the forward pass 2: Compute d λ1:T with (7) 3: Compute the derivatives of ℓwith respect to C, c, F, f, and xinit with (8) |
| Open Source Code | Yes | We have released our differentiable MPC solver as a standalone open source package that is available at https://github.com/locuslab/mpc.pytorch and our experimental code for this paper is also openly available at https://github.com/locuslab/differentiable-mpc. |
| Open Datasets | No | We collected a dataset of trajectories from an expert controller and vary the number of trajectories our models are trained on. (The paper generates its own data from an expert controller, but does not provide access information for this generated dataset.) |
| Dataset Splits | No | More information about the training and validation losses are in Appendix B. (While validation loss is mentioned, the main text does not provide specific details on the dataset split for validation.) |
| Hardware Specification | No | A single trial of our experiments takes 1-2 hours on a modern CPU. (This is too vague and does not provide specific hardware details.) |
| Software Dependencies | No | Our experiments are implemented with Py Torch [Paszke et al., 2017]. (Only the software name "PyTorch" is mentioned without a specific version number.) |
| Experiment Setup | Yes | We do learning by differentiating L with respect to ˆθ (using mini-batches with 32 examples) and taking gradient steps with RMSprop [Tieleman and Hinton, 2012]. and We optimize the nn setting with Adam [Kingma and Ba, 2014] with a learning rate of 10 4 and all other settings are optimized with RMSprop [Tieleman and Hinton, 2012] with a learning rate of 10 2 and a decay term of 0.5. and simultaneously learning the weights wg and goal state τg is instable and in our experiments we alternate learning of wg and τg independently every 10 epochs. |