Understanding Domain Randomization for Sim-to-real Transfer

Authors: Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, Liwei Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Despite its empirical successes, theoretical understanding on why this simple algorithm works is limited. In this paper, we propose a theoretical framework for sim-to-real transfers, in which the simulator is modeled as a set of MDPs with tunable parameters (corresponding to unknown physical parameters such as friction).
Researcher Affiliation Collaboration Xiaoyu Chen & Jiachen Hu Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University {cxy30, Nick H}@pku.edu.cn Chi Jin Department of Electrical and Computer Engineering, Princeton University chij@princeton.edu Lihong Li Amazon llh@amazon.com Liwei Wang Key Laboratory of Machine Perception, MOE, School of Artificial Intelligence, Peking University International Center for Machine Learning Research, Peking University wanglw@cis.pku.edu.cn
Pseudocode Yes As a byproduct of our proof, we provide the first provably efficient model-based algorithm for learning infinite-horizon average-reward MDPs with general function approximation (Algorithm 4 in Appendix C.3).
Open Source Code No The paper does not include any explicit statements about providing open-source code or links to a code repository.
Open Datasets No The paper is theoretical and does not report empirical experiments with specific datasets. It discusses 'training phase' conceptually within its theoretical framework but does not refer to public datasets used for training.
Dataset Splits No The paper is theoretical and does not report empirical experiments; therefore, it does not provide details on training/test/validation dataset splits.
Hardware Specification No The paper is theoretical and does not discuss the hardware used for any experiments.
Software Dependencies No The paper is theoretical and focuses on mathematical proofs and algorithm design. It does not mention any specific software dependencies or versions required to replicate an empirical study.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with concrete hyperparameter values or training configurations.