reproducibilityindex.ai

Weakly Supervised Multimodal Affordance Grounding for Egocentric Images

Authors: Lingjing Xu, Yang Gao, Wenfeng Song, Aimin Hao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate the superiority of our proposed method in terms of evaluation metrics and visual results when compared to existing affordance grounding models. Furthermore, ablation experiments confirm the effectiveness of our approach.
Researcher Affiliation	Academia	1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2Computer School, Beijing Information Science and Technology University, China
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code:https://github.com/xulingjing88/WSMA.
Open Datasets	Yes	We use the Affordance Grounding Dataset (AGD20K) (Luo et al. 2022b), which is a comprehensive dataset containing various viewpoints, specifically, 20,061 exocentric and 3,755 egocentric images. These images represent 36 unique affordance categories. We conduct evaluations under two distinct settings: Seen and Unseen . In addition to AGD20K, we have assembled a new dataset, HICO-IIF, by selecting specific subsets from the HICO-DET (Chao et al. 2018) and IIT-AFF (Nguyen et al. 2017) datasets.
Dataset Splits	No	The paper mentions 'Seen' and 'Unseen' settings for evaluation, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using DINO-ViT and CLIP as backbones, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We set the hyperparameters λcls, λclip, λd, and λl rela to 1, 1, 0.5, and 0.5 respectively, while the threshold is fixed at 0.2. Further details regarding parameter configurations can be found in the Appendix.