Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR. |
| Researcher Affiliation | Collaboration | 1University of Washington, 2Allen Institute for Artificial Intelligence |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available on the project page. Code and pretrained models are publicly available through the project page. https://embodied-codebook.github.io |
| Open Datasets | Yes | Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, Proc THOR, Architec THOR, Robo THOR, AI2-i THOR, and Manipula THOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. |
| Dataset Splits | Yes | We train our agent by DD-PPO (Wijmans et al., 2019) with 80 samplers for 20M steps on APND dataset (Ehsani et al., 2021) in 30 kitchen scenes, where we split them into 20 training scenes, 5 validation scenes, and 5 testing scenes. |
| Hardware Specification | Yes | To facilitate this, we utilize multi-node training on 4 servers, each equipped with eight A-100-80GB GPUs. |
| Software Dependencies | No | The paper mentions several software frameworks and algorithms like |
| Experiment Setup | Yes | All models are trained using the Allen Act framework. We follow (Deitke et al., 2022b) to pretrain the Emb CLIP baseline and Emb CLIP-Codebook on the the PROCTHOR-10k houses with 96 samplers for 435M steps. ...we optmize the models by Adam (Kingma & Ba, 2014) optimizer with a fixed learning rate of 3e 4. In addition, we follow (Deitke et al., 2022b) to have two warm up stages by training model parameters with lower number of steps per batch for PPO training (Schulman et al., 2017). In the first stage, we set number of steps as 32, and in the second stage, we increase the number of steps to 64. These two stages are trained by 1M steps, respectively. After the second stage, we increase the number of steps to 128 and keep it till the end of training (e.g., 435M steps). |