Interactive Visual Task Learning for Robots

Authors: Weiwei Gu, Anant Sah, Nakul Gopalan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present two sets of results. Firstly, we compare Hi Viscont with the baseline model (FALCON) on visual question answering(VQA) in three domains. Secondly, we conduct a human-subjects experiment where users teach our robot visual tasks in-situ. Our framework achieves 33% improvements in success rate metric, and 19% improvements in the object level accuracy compared to the baseline model.
Researcher Affiliation Academia School of Computing and Augmented Intelligence, Arizona State University {weiweigu, asah4, ng}@asu.edu
Pseudocode No The paper describes the methods in natural language and mathematical formulas but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions an "associated webpage 1" for instruction videos and manuals, but does not provide a link or explicit statement for the open-source code of the methodology.
Open Datasets Yes We first present experimental results on VQA tasks for three domains: the CUB-200-2011 dataset, a custom house-construction domain with building blocks, and a custom zoo domain with terrestrial and aquatic animals. CUB-200-2011 is a well-known public dataset.
Dataset Splits No The paper mentions "validation questions" in the context of training for gradient flow, but it does not provide explicit training/test/validation dataset splits (e.g., specific percentages or sample counts for each split) for reproducibility.
Hardware Specification No The paper specifies the robotic arm (Franka Emika Research 3 arm) and cameras (realsense D435 depth cameras) used for the robotic setup, but does not provide details on the computing hardware (e.g., specific GPU or CPU models) used for training or running the models.
Software Dependencies No The paper mentions using "SAM (Segment Anything Model)" and a "pretrained BERT-base model" but does not provide specific version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup No The paper states "Both concept net models are trained with the same split of concepts and the same training data for the same number of steps" and mentions "validation questions" during training, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It mentions a more detailed description in the appendix.