SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Russ R. Salakhutdinov
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model (Mask RCNN [19]) from 34.82/32.54 AP50 scores to 40.02/36.23 AP50 scores by just moving around in training environments, without having access to any additional human annotations. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research, 2Carnegie Mellon University, 3UIUC, 4UC Berkeley |
| Pseudocode | No | No explicit pseudocode or algorithm blocks found. |
| Open Source Code | No | Project Webpage: https://devendrachaplot.github.io/projects/seal |
| Open Datasets | Yes | We use the Habitat simulator [40] with the Gibson dataset [50] for our experiments. The Gibson dataset consists of scenes that are 3D reconstructions of real-world environments. We use a set of 30 scenes from the Gibson tiny set for our experiments whose semantic annotations are available from Armeni et al. [5]. The Mask-RCNN is pretrained on the MS-COCO dataset [30] for object detection and instance segmentation. |
| Dataset Splits | No | We use a split of 25 and 5 scenes for training and testing identical to prior work [9]. |
| Hardware Specification | No | Ruslan Salakhutdinov would also like to acknowledge NVIDIA s GPU support. |
| Software Dependencies | No | All other hyperparameters are set to default settings in Detectron2 [49]. Our PPO implementation is based on [29]. |
| Experiment Setup | Yes | We use Stochastic Gradient Descent [7] with a fixed learning rate of 0.0001 for N = 5000 iterations. The policy is trained with the Gainful Curiosity reward which is computed by counting the the number of voxels explored with ˆs(= 0.9) score for at least one object category. We use Adam optimizer with a learning rate of 0.000025, a discount factor of γ = 0.99, an entropy coefficient of 0.001, value loss coefficient of 0.5 for training the Global Policy. |