reproducibilityindex.ai

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Russ R. Salakhutdinov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model (Mask RCNN [19]) from 34.82/32.54 AP50 scores to 40.02/36.23 AP50 scores by just moving around in training environments, without having access to any additional human annotations.
Researcher Affiliation	Collaboration	1Facebook AI Research, 2Carnegie Mellon University, 3UIUC, 4UC Berkeley
Pseudocode	No	No explicit pseudocode or algorithm blocks found.
Open Source Code	No	Project Webpage: https://devendrachaplot.github.io/projects/seal
Open Datasets	Yes	We use the Habitat simulator [40] with the Gibson dataset [50] for our experiments. The Gibson dataset consists of scenes that are 3D reconstructions of real-world environments. We use a set of 30 scenes from the Gibson tiny set for our experiments whose semantic annotations are available from Armeni et al. [5]. The Mask-RCNN is pretrained on the MS-COCO dataset [30] for object detection and instance segmentation.
Dataset Splits	No	We use a split of 25 and 5 scenes for training and testing identical to prior work [9].
Hardware Specification	No	Ruslan Salakhutdinov would also like to acknowledge NVIDIA s GPU support.
Software Dependencies	No	All other hyperparameters are set to default settings in Detectron2 [49]. Our PPO implementation is based on [29].
Experiment Setup	Yes	We use Stochastic Gradient Descent [7] with a ﬁxed learning rate of 0.0001 for N = 5000 iterations. The policy is trained with the Gainful Curiosity reward which is computed by counting the the number of voxels explored with ˆs(= 0.9) score for at least one object category. We use Adam optimizer with a learning rate of 0.000025, a discount factor of γ = 0.99, an entropy coefﬁcient of 0.001, value loss coefﬁcient of 0.5 for training the Global Policy.