Provable Sim-to-real Transfer in Continuous Domain with Partial Observations

Authors: Jiachen Hu, Han Zhong, Chi Jin, Liwei Wang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Despite the empirical success of the sim-to-real transfer, its theoretical foundation is much less understood. In this paper, we study the sim-to-real transfer in continuous domain with partial observations, where the simulated environments and real-world environments are modeled by linear quadratic Gaussian (LQG) systems. We show that a popular robust adversarial training algorithm is capable of learning a policy from the simulated environment that is competitive to the optimal policy in the real-world environment. To achieve our results, we design a new algorithm for infinite-horizon average-cost LQGs and establish a regret bound that depends on the intrinsic complexity of the model class. Our algorithm crucially relies on a novel history clipping scheme, which might be of independent interest.
Researcher Affiliation Academia Jiachen Hu School of Computer Science, Peking University Nick H@pku.edu.cn Han Zhong Center for Data Science, Peking University hanzhong@stu.pku.edu.cn Chi Jin Department of Electrical and Computer Engineering, Princeton University chij@princeton.edu Liwei Wang National Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University Center for Data Science, Peking University, Beijing Institute of Big Data Research wanglw@cis.pku.edu.cn
Pseudocode Yes Algorithm 1 LQG-VTR ... Algorithm 2 Model Selection
Open Source Code No The paper does not contain any statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not involve empirical evaluation on datasets. It models environments using Linear Quadratic Gaussian (LQG) systems and focuses on theoretical analysis.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation on datasets. Therefore, no dataset split information for training, validation, or testing is provided.
Hardware Specification No The paper is theoretical and does not describe empirical experiments requiring specific hardware for execution.
Software Dependencies No The paper is theoretical and does not describe empirical experiments that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and theoretical analysis, thus no experimental setup details like hyperparameters or training configurations are provided.