Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Authors: Lingjing Xu, Yang Gao, Wenfeng Song, Aimin Hao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we demonstrate the superiority of our proposed method in terms of evaluation metrics and visual results when compared to existing affordance grounding models. Furthermore, ablation experiments confirm the effectiveness of our approach.
Researcher Affiliation Academia 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2Computer School, Beijing Information Science and Technology University, China
Pseudocode No The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Code:https://github.com/xulingjing88/WSMA.
Open Datasets Yes We use the Affordance Grounding Dataset (AGD20K) (Luo et al. 2022b), which is a comprehensive dataset containing various viewpoints, specifically, 20,061 exocentric and 3,755 egocentric images. These images represent 36 unique affordance categories. We conduct evaluations under two distinct settings: Seen and Unseen . In addition to AGD20K, we have assembled a new dataset, HICO-IIF, by selecting specific subsets from the HICO-DET (Chao et al. 2018) and IIT-AFF (Nguyen et al. 2017) datasets.
Dataset Splits No The paper mentions 'Seen' and 'Unseen' settings for evaluation, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using DINO-ViT and CLIP as backbones, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We set the hyperparameters λcls, λclip, λd, and λl rela to 1, 1, 0.5, and 0.5 respectively, while the threshold is fixed at 0.2. Further details regarding parameter configurations can be found in the Appendix.