DeSKO: Stability-Assured Robust Control with a Deep Stochastic Koopman Operator
Authors: Minghao Han, Jacob Euler-Rolle, Robert K. Katzschmann
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Modeling and control experiments on several advanced control benchmarks show that our framework is more robust and scalable than state-of-the-art deep Koopman operators and reinforcement learning methods. Tested control benchmarks include a soft robotic arm, a legged robot, and a biological gene regulatory network. We illustrate four simulated robotic modeling and control problems to show the general applicability of De SKO. First of all, the classic control problem of Cart Pole balancing from the control and Reinforcement Learning (RL) literature (Barto et al., 1983) is illustrated. Then, we consider more complicated high-dimensional continuous control problems of robots, such as the legged robot Half Cheetah and the soft robotic arm So Pr A. We simulate the Half Cheetah in the Mu Jo Co physics engine (Todorov et al., 2012) and the So Pr A (Toshimitsu et al., 2021) in the DRAKE simulation toolbox (Tedrake & the Drake Development Team, 2019). Lastly, we apply De SKO to autonomous systems in cell biology, i.e., biological gene regulatory networks (GRN) (Elowitz & Leibler, 2000). |
| Researcher Affiliation | Collaboration | Minghao Han1,2, Jacob Euler-Rolle2, Robert K. Katzschmann2 1 Department of Control Science and Engineering, Harbin Institute of Technology 2 Soft Robotics Lab, ETH Zurich {minhan,ejacob,rkk}@ethz.ch |
| Pseudocode | Yes | Algorithm 1 Robust MPC with De SKO Require: Weighting matrices Q, R, P, state feedback matrix K, prediction horizon H Initialize ˆµ1 µθ(x1) for t = 1, 2, . . . do Solve (6)-(7) to obtain c t Apply ut = c t + K(µθ(xt) ˆµt) to the system (2) ˆµt+1 Aˆµt + Bc t end for |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing the code for the described methodology, nor does it provide a direct link to a source-code repository. |
| Open Datasets | Yes | We simulate the Half Cheetah in the Mu Jo Co physics engine (Todorov et al., 2012) and the So Pr A (Toshimitsu et al., 2021) in the DRAKE simulation toolbox (Tedrake & the Drake Development Team, 2019). |
| Dataset Splits | Yes | For each environment, a training set composed of 40000 state-action pairs and a validation set of 4000 state-action pairs were collected. The actions were collected by uniformly sampling over the action space. Both methods were trained to minimize the cumulative prediction error over a time horizon of 16, and at each update step, a batch of 256 data-points was randomly sampled for the gradient-descent update. |
| Hardware Specification | No | The paper mentions simulating Half Cheetah in MuJoCo and So Pr A in DRAKE, but it does not specify the hardware (e.g., GPU/CPU models, memory) on which these simulations or the model training were performed. |
| Software Dependencies | No | The paper mentions using specific software environments like Open AI Gym, MuJoCo, and Drake, and optimization methods like ADAM solver (Kingma & Ba, 2017), but does not provide specific version numbers for these or other libraries/dependencies to ensure reproducibility. |
| Experiment Setup | Yes | For each environment, a training set composed of 40000 state-action pairs and a validation set of 4000 state-action pairs were collected. The actions were collected by uniformly sampling over the action space. Both methods were trained to minimize the cumulative prediction error over a time horizon of 16, and at each update step, a batch of 256 data-points was randomly sampled for the gradient-descent update. The same learning rate 0.001 and decay strategy were used for both methods. SAC iteratively interacts with the environments and updates the control policy. For each environment, 1000k steps of state-action-reward pairs were collected for training. The data is collected by randomly sampling actions from a uniform distribution over the action space. The De SKO model is trained with stochastic gradient descent. In our implementation, the ADAM solver (Kingma & Ba, 2017) is used for optimization. At each step during training, a batch of 256 data points is sampled from the training set and used for the model update. Table 1: Hyperparameters of De SKO (lists Size of data set D 40000, Batch Size 256, Learning rate 1e-3, Prediction horzion H 16, Structure of µθ( ) (256,128,64), Structure of σθ( ) (256,128,64), Activation function Re LU, Dimension of observables 20, Entropy threshold H -20). |