Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency
Authors: Devendra Singh Chaplot, Murtaza Dalal, Saurabh Gupta, Jitendra Malik, Russ R. Salakhutdinov
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the SEAL framework can be used to close the action-perception loop: it improves object detection and instance segmentation performance of a pretrained perception model (Mask RCNN [19]) from 34.82/32.54 AP50 scores to 40.02/36.23 AP50 scores by just moving around in training environments, without having access to any additional human annotations. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research, 2Carnegie Mellon University, 3UIUC, 4UC Berkeley |
| Pseudocode | No | No explicit pseudocode or algorithm blocks found. |
| Open Source Code | No | Project Webpage: https://devendrachaplot.github.io/projects/seal |
| Open Datasets | Yes | We use the Habitat simulator [40] with the Gibson dataset [50] for our experiments. The Gibson dataset consists of scenes that are 3D reconstructions of real-world environments. We use a set of 30 scenes from the Gibson tiny set for our experiments whose semantic annotations are available from Armeni et al. [5]. The Mask-RCNN is pretrained on the MS-COCO dataset [30] for object detection and instance segmentation. |
| Dataset Splits | No | We use a split of 25 and 5 scenes for training and testing identical to prior work [9]. |
| Hardware Specification | No | Ruslan Salakhutdinov would also like to acknowledge NVIDIA s GPU support. |
| Software Dependencies | No | All other hyperparameters are set to default settings in Detectron2 [49]. Our PPO implementation is based on [29]. |
| Experiment Setup | Yes | We use Stochastic Gradient Descent [7] with a fixed learning rate of 0.0001 for N = 5000 iterations. The policy is trained with the Gainful Curiosity reward which is computed by counting the the number of voxels explored with ˆs(= 0.9) score for at least one object category. We use Adam optimizer with a learning rate of 0.000025, a discount factor of γ = 0.99, an entropy coefficient of 0.001, value loss coefficient of 0.5 for training the Global Policy. |