Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Weakly Supervised Multimodal Affordance Grounding for Egocentric Images
Authors: Lingjing Xu, Yang Gao, Wenfeng Song, Aimin Hao
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we demonstrate the superiority of our proposed method in terms of evaluation metrics and visual results when compared to existing affordance grounding models. Furthermore, ablation experiments confirm the effectiveness of our approach. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2Computer School, Beijing Information Science and Technology University, China |
| Pseudocode | No | The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code:https://github.com/xulingjing88/WSMA. |
| Open Datasets | Yes | We use the Affordance Grounding Dataset (AGD20K) (Luo et al. 2022b), which is a comprehensive dataset containing various viewpoints, specifically, 20,061 exocentric and 3,755 egocentric images. These images represent 36 unique affordance categories. We conduct evaluations under two distinct settings: Seen and Unseen . In addition to AGD20K, we have assembled a new dataset, HICO-IIF, by selecting specific subsets from the HICO-DET (Chao et al. 2018) and IIT-AFF (Nguyen et al. 2017) datasets. |
| Dataset Splits | No | The paper mentions 'Seen' and 'Unseen' settings for evaluation, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using DINO-ViT and CLIP as backbones, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set the hyperparameters λcls, λclip, λd, and λl rela to 1, 1, 0.5, and 0.5 respectively, while the threshold is fixed at 0.2. Further details regarding parameter configurations can be found in the Appendix. |