Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR.
Researcher Affiliation Collaboration 1University of Washington, 2Allen Institute for Artificial Intelligence
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available on the project page. Code and pretrained models are publicly available through the project page. https://embodied-codebook.github.io
Open Datasets Yes Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat.
Dataset Splits Yes We train our agent by DD-PPO (Wijmans et al., 2019) with 80 samplers for 20M steps on APND dataset (Ehsani et al., 2021) in 30 kitchen scenes, where we split them into 20 training scenes, 5 validation scenes, and 5 testing scenes.
Hardware Specification Yes To facilitate this, we utilize multi-node training on 4 servers, each equipped with eight A-100-80GB GPUs.
Software Dependencies No The paper mentions several software frameworks and algorithms like
Experiment Setup Yes All models are trained using the Allen Act framework. We follow (Deitke et al., 2022b) to pretrain the Emb CLIP baseline and Emb CLIP-Codebook on the the PROCTHOR-10k houses with 96 samplers for 435M steps. ...we optmize the models by Adam (Kingma & Ba, 2014) optimizer with a fixed learning rate of 3e 4. In addition, we follow (Deitke et al., 2022b) to have two warm up stages by training model parameters with lower number of steps per batch for PPO training (Schulman et al., 2017). In the first stage, we set number of steps as 32, and in the second stage, we increase the number of steps to 64. These two stages are trained by 1M steps, respectively. After the second stage, we increase the number of steps to 128 and keep it till the end of training (e.g., 435M steps).