Sim and Real: Better Together
Authors: Shirli Di-Castro, Dotan Di Castro, Shie Mannor
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our method on a sim-to-real environment. We demonstrate our findings in a simulation of sim-to-real, with two simulations where one is a distorted version of the other and analyze it empirically. Section 6 Experimental Evaluation. |
| Researcher Affiliation | Collaboration | Shirli Di Castro Shashua Technion Institute of Technology Haifa, Israel shirlidi@technion.ac.il Shie Mannor Technion and NVIDIA Research Israel shie@technion.ac.il smannor@nvidia.com Dotan Di Castro Bosch Center of AI Haifa, Israel dotan.dicastro@il.bosch.com. This research was conducted during an internship in Bosch Center of AI. |
| Pseudocode | Yes | Algorithm 1 Mixing Sim and Real with Linear Actor Critic |
| Open Source Code | Yes | The code for the experiments is available at: https://github.com/sdicastro/ Sim And Real Better Together. |
| Open Datasets | Yes | We evaluate the performance of our proposed algorithm on two Fetch Push environments [37], one acts as the real environment and the other is the simulation environment |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about training/validation/test dataset splits, such as percentages or sample counts for each split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components such as DDPG, Hindsight Experience Replay (HER), and Mujoco simulator, but it does not provide specific version numbers for these or other ancillary software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | We set K = 2 meaning there is only one real and one simulation environments. We fix optimization parameter βr = 0.5 and test different collection parameter qr = 0, 0.1, 0.3, 0.5, 0.7, 0.9, 1. The agent gets a reward of -1, if the desired goal was not yet achieved and 0 if it was achieved within some tolerance. We repeated each experiment with 10 different random seeds and present the mean and standard deviation values. |