Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots

Authors: Xavier Puig, Eric Undersander, Andrew Szot, Mikael Dallaire Cote, Tsung-Yen Yang, Ruslan Partsey, Ruta Desai, Alexander Clegg, Michal Hlavac, So Yeon Min, Vladimír Vondruš, Theophile Gervet, Vincent-Pierre Berges, John M Turner, Oleksandr Maksymets, Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai, Roozbeh Mottaghi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before.
Researcher Affiliation Collaboration core team, project leads. Work done at FAIR, Meta. (...) Zsolt Kira, Mrinal Kalakrishnan, Jitendra Malik, Devendra Singh Chaplot, Unnat Jain, Dhruv Batra, Akshara Rai , Roozbeh Mottaghi
Pseudocode No The paper describes algorithms and procedures in prose and through diagrams, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our framework is open-sourced, for more details see Appendix A.
Open Datasets Yes We incorporate the Habitat Synthetic Scenes Dataset (HSSD) (Khanna et al., 2023) in Habitat 3.0, and use the Boston Dynamics (BD) Spot robot.
Dataset Splits Yes We use 37 train, 12 validation and 10 test scenes. For details on the robot and scenes, refer to Appendix D and E. (...) Specifically, we use 37 scenes from training, sampling 1000 episodes per scene, 12 for validation, with 100 episodes per scene and 10 for test, with 15 episodes in each scene.
Hardware Specification Yes We train all the end-to-end RL social navigation baselines using DD-PPO (Wijmans et al., 2019), distributing training across 4 NVIDIA A100 GPUs. (...) All tests are conducted on a single Nvidia V100 GPU.
Software Dependencies No The paper mentions software like DD-PPO, Adam, LSTM, and ResNet-18, and programming languages like Python. However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Each GPU runs 24 parallel environments, and collects 128 steps for each update. We use a long short-term memory networks (LSTM) (Hochreiter & Schmidhuber, 1997) policy with Res Net18 as the visual backbone and two recurrent layers, resulting nearly 8517k parameters. We use a learning rate of 1 10 4 and the maximum gradient norm of 0.2. (...) We train with Adam (Kingma & Ba, 2014) using a learning rate of 2.5e 4. We use 2 PPO minibatches and 1 epoch per update, an entropy loss of 1e 4, and clip the gradient norm to 0.2.