Regularity as Intrinsic Reward for Free Play
Authors: Cansu Sancaktar, Justus Piater, Georg Martius
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments We evaluate Ra IR in the two environments shown in Fig. 1. |
| Researcher Affiliation | Academia | Cansu Sancaktar1 Justus Piater2 Georg Martius1,3 1Max Planck Institute for Intelligent Systems, Germany 2University of Innsbruck Austria 3University of Tübingen Germany |
| Pseudocode | Yes | Algorithm S1 Free Play in Intrinsic Phase (taken from [12]) |
| Open Source Code | Yes | Code and videos are available at https://sites.google.com/view/rair-project. |
| Open Datasets | No | The paper uses custom or extended environments ("Shape Grid World", "Fetch Pick & Place Construction") and generates its own data through interaction during free play, rather than using a pre-existing, publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes data collection into a 'replay buffer' during free play and subsequent evaluation on 'downstream tasks', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for its experiments. |
| Hardware Specification | Yes | For Ra IR + CEE-US, the full free-play (300 training iterations) in CONSTRUCTION with 6 objects, where overall 600K data points are collected, takes roughly 87 hours using a single GPU (NVIDIA Ge Force RTX 3060) and 6 cores on an AMD Ryzen 9 5900X Processor. |
| Software Dependencies | No | The paper mentions 'Mujoco' as the physics engine for the environments but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation (e.g., Python, PyTorch, TensorFlow, or other packages). |
| Experiment Setup | Yes | The controller parameters used when optimizing Ra IR with ground truth (GT) models are given in Table S4. For the GNN model architecture as well as the training parameters for model learning are listed in Table S6. The parameters for the RND module is given in Table S9. |