Domain Randomization via Entropy Maximization
Authors: Gabriele Tiboni, Pascal Klink, Jan Peters, Tatiana Tommasi, Carlo D'Eramo, Georgia Chalvatzaki
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters. |
| Researcher Affiliation | Academia | 1Department of Control and Computer Engineering, Politecnico di Torino, Italy 2Department of Computer Science, Technische Universit at Darmstadt, Germany 3Center for Artificial Intelligence and Data Science, University of W urzburg, Germany 4Hessian Center for Artificial Intelligence (Hessian.AI), Darmstadt, Germany 5Centre for Cognitive Science, TU Darmstadt, Germany. 6Systems AI for Robot Learning, German Research Center for AI (DFKI) 7Center for Mind, Brain and Behavior, Uni. Marburg and JLU Giessen, Germany |
| Pseudocode | Yes | Algorithm 1: Domain Randomization via Entropy Maximization (DORAEMON) |
| Open Source Code | Yes | Refer to our public code implementation at https://gabrieletiboni.github.io/doraemon/ for the full reproducibility of our experimental evaluation. |
| Open Datasets | Yes | We conduct a thorough experimental evaluation of DORAEMON on six benchmark tasks in simulation, from the Open AI Gym (Brockman et al., 2016) Mu Jo Co environments. |
| Dataset Splits | No | The paper discusses 'training data' and 'test sets' (Sim2Sim and Sim2Real) but does not explicitly define training/validation/test splits or mention a separate validation set with specific percentages or counts for reproducibility. |
| Hardware Specification | No | The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universit at Erlangen-N urnberg (FAU) under the NHR project b187cb. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (DFG) 440719683. We further acknowledge the support of the European H2020 Elise project (www.elise-ai.eu), for the availability of HPC resources and support. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC), Open AI Gym, MuJoCo, and Scipy, but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We then individually tune the hyperparameters of the considered baselines for a fair comparison: for each method, we perform a grid-search over its hyperparameters to obtain optimal average performance across all tasks; then, we separately tune a single selected hyperparameter per method on each environment individually. In particular, we choose to separately tune α for LSDR, for Auto DR, and ϵ for DORAEMON the notation here is referring to the respective papers notation. Note that these parameters generally regulate the pace of the growing distribution. We query the policy at a frequency of 50Hz, and follow the resulting lowlevel 20ms-trajectory at 1000Hz. |