reproducibilityindex.ai

Visually Grounded Commonsense Knowledge Acquisition

Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Mengdi Li, Ruobing Xie, Cornelius Weber, Zhiyuan Liu, Hai-Tao Zheng, Stefan Wermter, Tat-Seng Chua, Maosong Sun

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experimental results in held-out and human evaluation show that CLEVER can extract commonsense knowledge in promising quality, outperforming pre-trained language model-based methods by 3.9 AUC and 6.4 m AUC points. The predicted commonsense scores show strong correlation with human judgment with a 0.78 Spearman coefficient.
Researcher Affiliation	Collaboration	Yuan Yao1, Tianyu Yu2, Ao Zhang4, Mengdi Li5, Ruobing Xie6, Cornelius Weber4, Zhiyuan Liu1*, Hai-Tao Zheng2,3 , Stefan Wermter5, Tat-Seng Chua4, Maosong Sun1 1Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Beijing, China 2Shenzhen International Graduate School, Tsinghua University 3Peng Cheng Laboratory 4School of Computing, National University of Singapore, Singapore 5Department of Informatics, University of Hamburg, Hamburg, Germany 6We Chat AI, Tencent
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The data and codes can be obtained at https://github.com/thunlp/ CLEVER.
Open Datasets	Yes	We construct the CKE benchmark based on Visual Genome (Krishna et al. 2017), which contains relational triplets about entities from real-world image data.
Dataset Splits	Yes	For automatic held-out evaluation (Mintz et al. 2009), we split the triplets into disjoint training, validation and test sets. Each entity pair is associated with Visual Genome images that contain the entities. The training/validation/test data contains 13,780/1,166/3,496 commonsense facts, 6,443/678/1,964 entity pairs, and 55,911/5,224/13,722 images respectively.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments.
Software Dependencies	No	The paper mentions using VinVL, CLIP, and Neural Motif models, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper refers to an appendix for implementation details but does not include specific hyperparameters (e.g., learning rate, batch size, number of epochs) or optimizer settings in the main text.