Self-supervised Visual Reinforcement Learning with Object-centric Representations
Authors: Andrii Zadaianchuk, Maximilian Seitzer, Georg Martius
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have done computational experiments to address the following questions: How well does our method scale to challenging tasks with a large number of objects in case when ground-truth representations are provided? How does our method perform compared to prior visual goal-conditioned RL methods on image-based, multiobject continuous control tasks? How suitable are the representations learned by the compositional generative world model for discovering and solving RL tasks? |
| Researcher Affiliation | Academia | 1 Max Planck Institute for Intelligent Systems, Tübingen, Germany 2 Department of Computer Science, ETH Zurich |
| Pseudocode | Yes | Algorithm 1 SMORL: Self-Supervised Multi-Object RL (Training) ... Algorithm 2 SMORL: Self-Supervised Multi-object RL (Training with Details) ... Algorithm 3 SMORL (Evaluation) |
| Open Source Code | No | Our code, as well as the multi-objects environments will be made public after the paper publication. |
| Open Datasets | No | To answer these questions, we constructed the Multi-Object Visual Push and Multi-Object Visual Rearrange environments. Both environments are based on Mu Jo Co (Todorov et al., 2012) and the Multiworld package for image-based continuous control tasks introduced by Nair et al. (2018), and contain a 7-dof Sawyer arm where the agent needs to be controlled to manipulate a variable number of small picks on a table. |
| Dataset Splits | No | No explicit statement providing specific training/validation/test dataset splits (percentages, sample counts, or predefined split citations) was found. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running experiments were provided. |
| Software Dependencies | No | The paper mentions software like Pytorch, MuJoCo, Multiworld, and Adam optimizer, but does not provide specific version numbers for these software dependencies to ensure reproducibility. |
| Experiment Setup | Yes | We refer to Table 2 for general hyper-parameters of SMORL and to Table 3 for environment specific hyper-parameters of SMORL. |