Layout-Aware Dreamer for Embodied Visual Referring Expression Grounding

Authors: Mingxiao Li, Zehao Wang, Tinne Tuytelaars, Marie-Francine Moens

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art.
Researcher Affiliation Academia 1 Computer Science Department of KU Leuven 2 Electrical Engineering Department (ESAT-PSI) of KU Leuven
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks, nor any clearly labeled algorithm sections or code-like formatted procedures.
Open Source Code Yes The code is released at https://github.com/zehao-wang/LAD
Open Datasets Yes Because the navigation task is characterized by realistic high-level instructions, we conduct experiments and evaluate our agent on the embodied goal-oriented benchmark REVERIE (Qi et al. 2020) and the SOON (Song et al. 2022) datasets.
Dataset Splits Yes The dataset is split into four sets, including the training set, validation seen set, validation unseen set, and test set.
Hardware Specification Yes The whole training procedure takes two days with a single NVIDIA-P100 GPU.
Software Dependencies No The paper mentions using GLIDE (Nichol et al. 2022) and CLIP (Radford et al. 2021) for data preprocessing and feature extraction but does not provide specific version numbers for these or any other software dependencies, programming languages, or libraries used for the experiments.
Experiment Setup Yes The model is trained for 100k iterations with a batch size of 32 for single action prediction and 50k iterations with a batch size of 8 for imitation learning with DAgger (Ross, Gordon, and Bagnell 2011). We optimize both phases by the Adam W (Loshchilov and Hutter 2018) optimizer with a learning rate of 5e-5 and 1e-5, respectively.