Revisiting Implicit Differentiation for Learning Problems in Optimal Control
Authors: Ming Xu, Timothy L. Molloy, Stephen Gould
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-Do F maneuvering quadrotor and 6-Do F rocket powered landing. |
| Researcher Affiliation | Academia | Ming Xu School of Computing Australian National University mingda.xu@anu.edu.au Timothy Molloy School of Engineering Australian National University timothy.molloy@anu.edu.au Stephen Gould School of Computing Australian National University stephen.gould@anu.edu.au |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at https://github.com/mingu6/Implicit-Diff-Optimal-Control |
| Open Datasets | No | The paper mentions using specific simulation environments (cartpole, 6-DoF quadrotor, 2-link robot arm, 6-DoF rocket landing) and states that 'Demonstration Trajectories' are used, some generated by solving COC problems and others potentially real (LfD implies demonstrations). However, it does not provide specific access information (URL, DOI, citation with author/year) to these demonstration datasets if they are considered external or publicly available. |
| Dataset Splits | No | The paper mentions using 'five imitation learning trials' and that 'for each trial, θ is initialized by adding uniform noise to the true value'. It also states 'Up to five demonstration trajectories are used in the Lf D setting with no inequality constraints' and 'For the setting with inequality constraints present, we use only one demonstration trajectory'. This describes how data is used across trials but does not specify train/test/validation splits of a given dataset, nor does it refer to predefined splits from a standard benchmark. |
| Hardware Specification | Yes | Experiments are run on an AMD Ryzen 9 7900X 12-core processor, 128Gb RAM and Ubuntu 20.04. |
| Software Dependencies | Yes | All experiments in this section are run on a single thread of an AMD Ryzen 9 7900X 4.7Ghz desktop CPU. We implemented the identities in Equation 4 to verify our claims around numerical stability and computational efficiency. In our non-optimized Python implementation of IDOC, we batch together and vectorize computations involving blocks with an identical number of active constraints. As expected, trajectory derivatives for inequality constrained problems are slightly slower to compute compared to the equivalent without (hard) inequality constraints (log-barrier approximation), due to the necessary computational overheads required for identifying and batching blocks. The IPOPT solver [45] is used in the forward pass to solve the COC problem, and Lagrange multipliers λ are extracted from the solver output. Numerical linear algebra library Numpy [16]. |
| Experiment Setup | Yes | For all methods, we use the gradient descent with the same learning rate for a given environment. Table 2: Additional hyperparameters for Lf D experiments. Environment γ lr ndemos Cartpole (E) 0.1 10 4 5 Quadrotor (E) 0.1 10 4 2 Robotarm (E) 0.1 10 4 4 Rocket (E) 0.1 3 10 4 1 Cartpole (I) 0.01 0.1 8 10 5 1 Quadrotor (I) 0.01 0.15 2 10 4 1 Robotarm (I) 0.01 0.2 2 10 3 1 Rocket (I) 1 0.1 10 5 1 |