Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR. |
| Researcher Affiliation | Collaboration | 1University of Washington, 2Allen Institute for Artificial Intelligence |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available on the project page. Code and pretrained models are publicly available through the project page. https://embodied-codebook.github.io |
| Open Datasets | Yes | Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. |
| Dataset Splits | Yes | We train our agent by DD-PPO (Wijmans et al., 2019) with 80 samplers for 20M steps on APND dataset (Ehsani et al., 2021) in 30 kitchen scenes, where we split them into 20 training scenes, 5 validation scenes, and 5 testing scenes. |
| Hardware Specification | Yes | To facilitate this, we utilize multi-node training on 4 servers, each equipped with eight A-100-80GB GPUs. |
| Software Dependencies | No | The paper mentions several software frameworks and algorithms like |
| Experiment Setup | Yes | All models are trained using the Allen Act framework. We follow (Deitke et al., 2022b) to pretrain the Emb CLIP baseline and Emb CLIP-Codebook on the the PROCTHOR-10k houses with 96 samplers for 435M steps. ...we optmize the models by Adam (Kingma & Ba, 2014) optimizer with a fixed learning rate of 3e 4. In addition, we follow (Deitke et al., 2022b) to have two warm up stages by training model parameters with lower number of steps per batch for PPO training (Schulman et al., 2017). In the first stage, we set number of steps as 32, and in the second stage, we increase the number of steps to 64. These two stages are trained by 1M steps, respectively. After the second stage, we increase the number of steps to 128 and keep it till the end of training (e.g., 435M steps). |