Sim and Real: Better Together

Authors: Shirli Di-Castro, Dotan Di Castro, Shie Mannor

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our method on a sim-to-real environment. We demonstrate our findings in a simulation of sim-to-real, with two simulations where one is a distorted version of the other and analyze it empirically. Section 6 Experimental Evaluation.
Researcher Affiliation Collaboration Shirli Di Castro Shashua Technion Institute of Technology Haifa, Israel shirlidi@technion.ac.il Shie Mannor Technion and NVIDIA Research Israel shie@technion.ac.il smannor@nvidia.com Dotan Di Castro Bosch Center of AI Haifa, Israel dotan.dicastro@il.bosch.com. This research was conducted during an internship in Bosch Center of AI.
Pseudocode Yes Algorithm 1 Mixing Sim and Real with Linear Actor Critic
Open Source Code Yes The code for the experiments is available at: https://github.com/sdicastro/ Sim And Real Better Together.
Open Datasets Yes We evaluate the performance of our proposed algorithm on two Fetch Push environments [37], one acts as the real environment and the other is the simulation environment
Dataset Splits No The paper mentions training and testing but does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for each split.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions software components such as DDPG, Hindsight Experience Replay (HER), and Mujoco simulator, but it does not provide specific version numbers for these or other ancillary software dependencies like programming languages or libraries.
Experiment Setup Yes We set K = 2 meaning there is only one real and one simulation environments. We fix optimization parameter βr = 0.5 and test different collection parameter qr = 0, 0.1, 0.3, 0.5, 0.7, 0.9, 1. The agent gets a reward of -1, if the desired goal was not yet achieved and 0 if it was achieved within some tolerance. We repeated each experiment with 10 different random seeds and present the mean and standard deviation values.