reproducibilityindex.ai

Domain Randomization via Entropy Maximization

Authors: Gabriele Tiboni, Pascal Klink, Jan Peters, Tatiana Tommasi, Carlo D'Eramo, Georgia Chalvatzaki

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.
Researcher Affiliation	Academia	1Department of Control and Computer Engineering, Politecnico di Torino, Italy 2Department of Computer Science, Technische Universit at Darmstadt, Germany 3Center for Artificial Intelligence and Data Science, University of W urzburg, Germany 4Hessian Center for Artificial Intelligence (Hessian.AI), Darmstadt, Germany 5Centre for Cognitive Science, TU Darmstadt, Germany. 6Systems AI for Robot Learning, German Research Center for AI (DFKI) 7Center for Mind, Brain and Behavior, Uni. Marburg and JLU Giessen, Germany
Pseudocode	Yes	Algorithm 1: Domain Randomization via Entropy Maximization (DORAEMON)
Open Source Code	Yes	Refer to our public code implementation at https://gabrieletiboni.github.io/doraemon/ for the full reproducibility of our experimental evaluation.
Open Datasets	Yes	We conduct a thorough experimental evaluation of DORAEMON on six benchmark tasks in simulation, from the Open AI Gym (Brockman et al., 2016) Mu Jo Co environments.
Dataset Splits	No	The paper discusses 'training data' and 'test sets' (Sim2Sim and Sim2Real) but does not explicitly define training/validation/test splits or mention a separate validation set with specific percentages or counts for reproducibility.
Hardware Specification	No	The authors gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander-Universit at Erlangen-N urnberg (FAU) under the NHR project b187cb. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (DFG) 440719683. We further acknowledge the support of the European H2020 Elise project (www.elise-ai.eu), for the availability of HPC resources and support.
Software Dependencies	No	The paper mentions using Soft Actor-Critic (SAC), Open AI Gym, MuJoCo, and Scipy, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We then individually tune the hyperparameters of the considered baselines for a fair comparison: for each method, we perform a grid-search over its hyperparameters to obtain optimal average performance across all tasks; then, we separately tune a single selected hyperparameter per method on each environment individually. In particular, we choose to separately tune α for LSDR, for Auto DR, and ϵ for DORAEMON the notation here is referring to the respective papers notation. Note that these parameters generally regulate the pace of the growing distribution. We query the policy at a frequency of 50Hz, and follow the resulting lowlevel 20ms-trajectory at 1000Hz.