Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
Authors: Jiachen Hu, Han Zhong, Chi Jin, Liwei Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Despite the empirical success of the sim-to-real transfer, its theoretical foundation is much less understood. In this paper, we study the sim-to-real transfer in continuous domain with partial observations, where the simulated environments and real-world environments are modeled by linear quadratic Gaussian (LQG) systems. We show that a popular robust adversarial training algorithm is capable of learning a policy from the simulated environment that is competitive to the optimal policy in the real-world environment. To achieve our results, we design a new algorithm for infinite-horizon average-cost LQGs and establish a regret bound that depends on the intrinsic complexity of the model class. Our algorithm crucially relies on a novel history clipping scheme, which might be of independent interest. |
| Researcher Affiliation | Academia | Jiachen Hu School of Computer Science, Peking University Nick H@pku.edu.cn Han Zhong Center for Data Science, Peking University hanzhong@stu.pku.edu.cn Chi Jin Department of Electrical and Computer Engineering, Princeton University chij@princeton.edu Liwei Wang National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University Center for Data Science, Peking University, Beijing Institute of Big Data Research wanglw@cis.pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 LQG-VTR ... Algorithm 2 Model Selection |
| Open Source Code | No | The paper does not contain any statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical evaluation on datasets. It models environments using Linear Quadratic Gaussian (LQG) systems and focuses on theoretical analysis. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation on datasets. Therefore, no dataset split information for training, validation, or testing is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments requiring specific hardware for execution. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and theoretical analysis, thus no experimental setup details like hyperparameters or training configurations are provided. |