LINGO-Space: Language-Conditioned Incremental Grounding for Space

Authors: Dohyun Kim, Nayoung Oh, Deokmin Hwang, Daehyung Park

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluations show that the estimation using polar distributions enables a robot to ground locations successfully through 20 table-top manipulation benchmark tests. We also show that updating the distribution helps the grounding method accurately narrow the referring space. We finally demonstrate the robustness of the space grounding with simulated manipulation and real quadruped robot navigation tasks.
Researcher Affiliation Academia Dohyun Kim, Nayoung Oh, Deokmin Hwang, Daehyung Park* Korea Advanced Institute of Science and Technology, Republic of Korea {dohyun141, lightsalt, gsh04089, daehyung}@kaist.ac.kr
Pseudocode No The information is insufficient. The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and videos are available at https://lingo-space.github.io.
Open Datasets Yes CLIPORT s benchmark (Shridhar, Manuelli, and Fox 2022): We use three tasks designed to pack an object inside a referenced object. PARAGON s benchmark (Zhao, Lee, and Hsu 2023): This benchmark generates a dataset for the task of placing an object in the presence of semantically identical objects following one of nine directional relations: center, left, right, above, below, left above, left below, right above, and right below. SREM s benchmark (Gkanatsios et al. 2023): We use eight tasks designed to rearrange a colored object inside a referenced object following spatial instructions featuring one of four directional relations: left, right, behind, and front.
Dataset Splits No The information is insufficient. The paper specifies training and testing samples (e.g., 'For PARAGON benchmark, for each task, training and testing employ 400 and 200 scenes, respectively. Otherwise, we train models on 100 samples, with subsequent testing performed on 200 randomized samples.'), but does not explicitly mention a validation split or how it was used for hyperparameter tuning.
Hardware Specification No The information is insufficient. The paper mentions the Boston Dynamics Spot robot used for real-world demonstrations and various software tools and models, but does not provide specific hardware details (like GPU models, CPU types, or memory) used for running the main experiments or training the models.
Software Dependencies No The information is insufficient. The paper mentions various software components and models (e.g., 'Py Bullet simulator', 'Chat GPT (Open AI 2023)', 'Grounding DINO (Liu et al. 2023)', 'CLIP image encoder (Radford et al. 2021)'), but does not provide specific version numbers for these or other key software dependencies required for replication.
Experiment Setup No The information is insufficient. The paper describes the datasets, evaluation metrics, and general training/testing sample counts (e.g., 'train models on 100 samples, with subsequent testing performed on 200 randomized samples'), but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size), optimizer settings, or the number of epochs used for training.