Exploration via Elliptical Episodic Bonuses

Authors: Mikael Henaff, Roberta Raileanu, Minqi Jiang, Tim Rocktäschel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments For all our experiments we use the Torchbeast [40] implementation of IMPALA [23] as our base RL algorithm. For certain skill-based tasks, we restricted the action space to the necessary actions for solving the task at hand, since we found that none of the methods were able to make progress with the full action space (see Appendix C.1.4). See Appendix C.1.3 for environment details and C.1 for other experiment details.
Researcher Affiliation Collaboration Mikael Henaff Meta AI Research mikaelhenaff@meta.com Roberta Raileanu Meta AI Research raileanu@meta.com Minqi Jiang University College London Meta AI Research meta@fb.com Tim Rocktäschel University College London t.rocktaschel@cs.ucl.ac.uk
Pseudocode Yes Algorithm 1 Exploration via Episodic Elliptical Bonuses (E3B)
Open Source Code Yes Our code is available at https://github.com/facebookresearch/e3b.
Open Datasets Yes We use the HM3D dataset [57], which contains high-quality renditions of 1000 different indoor spaces.
Dataset Splits Yes The 1000 scenes of the HM3D dataset are split into 800 training, 100 validation, and 100 test scenes based on the splits released with the dataset.
Hardware Specification Yes All experiments were run on an internal cluster of machines equipped with 8 V100 GPUs and 192GB RAM.
Software Dependencies No The information is insufficient as the paper mentions software like 'Torchbeast' and 'PyTorch' but does not specify version numbers for these or other dependencies.
Experiment Setup Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Appendix C.