SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Russ R. Salakhutdinov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model (Mask RCNN [19]) from 34.82/32.54 AP50 scores to 40.02/36.23 AP50 scores by just moving around in training environments, without having access to any additional human annotations.
Researcher Affiliation Collaboration 1Facebook AI Research, 2Carnegie Mellon University, 3UIUC, 4UC Berkeley
Pseudocode No No explicit pseudocode or algorithm blocks found.
Open Source Code No Project Webpage: https://devendrachaplot.github.io/projects/seal
Open Datasets Yes We use the Habitat simulator [40] with the Gibson dataset [50] for our experiments. The Gibson dataset consists of scenes that are 3D reconstructions of real-world environments. We use a set of 30 scenes from the Gibson tiny set for our experiments whose semantic annotations are available from Armeni et al. [5]. The Mask-RCNN is pretrained on the MS-COCO dataset [30] for object detection and instance segmentation.
Dataset Splits No We use a split of 25 and 5 scenes for training and testing identical to prior work [9].
Hardware Specification No Ruslan Salakhutdinov would also like to acknowledge NVIDIA s GPU support.
Software Dependencies No All other hyperparameters are set to default settings in Detectron2 [49]. Our PPO implementation is based on [29].
Experiment Setup Yes We use Stochastic Gradient Descent [7] with a fixed learning rate of 0.0001 for N = 5000 iterations. The policy is trained with the Gainful Curiosity reward which is computed by counting the the number of voxels explored with ˆs(= 0.9) score for at least one object category. We use Adam optimizer with a learning rate of 0.000025, a discount factor of γ = 0.99, an entropy coefficient of 0.001, value loss coefficient of 0.5 for training the Global Policy.