Layout-Aware Dreamer for Embodied Visual Referring Expression Grounding
Authors: Mingxiao Li, Zehao Wang, Tinne Tuytelaars, Marie-Francine Moens
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. |
| Researcher Affiliation | Academia | 1 Computer Science Department of KU Leuven 2 Electrical Engineering Department (ESAT-PSI) of KU Leuven |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks, nor any clearly labeled algorithm sections or code-like formatted procedures. |
| Open Source Code | Yes | The code is released at https://github.com/zehao-wang/LAD |
| Open Datasets | Yes | Because the navigation task is characterized by realistic high-level instructions, we conduct experiments and evaluate our agent on the embodied goal-oriented benchmark REVERIE (Qi et al. 2020) and the SOON (Song et al. 2022) datasets. |
| Dataset Splits | Yes | The dataset is split into four sets, including the training set, validation seen set, validation unseen set, and test set. |
| Hardware Specification | Yes | The whole training procedure takes two days with a single NVIDIA-P100 GPU. |
| Software Dependencies | No | The paper mentions using GLIDE (Nichol et al. 2022) and CLIP (Radford et al. 2021) for data preprocessing and feature extraction but does not provide specific version numbers for these or any other software dependencies, programming languages, or libraries used for the experiments. |
| Experiment Setup | Yes | The model is trained for 100k iterations with a batch size of 32 for single action prediction and 50k iterations with a batch size of 8 for imitation learning with DAgger (Ross, Gordon, and Bagnell 2011). We optimize both phases by the Adam W (Loshchilov and Hutter 2018) optimizer with a learning rate of 5e-5 and 1e-5, respectively. |