Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improving Multimodal Social Media Popularity Prediction via Selective Retrieval Knowledge Augmentation

Authors: Xovee Xu, Yifan Zhang, Fan Zhou, Jingkuan Song

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines.
Researcher Affiliation Academia University of Electronic Science and Technology of China, Chengdu, Sichuan 610054 China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology with figures (Figure 2) and detailed text descriptions for each component (Meta Retriever, Selective Refiner, Knowledge-Augmented Prediction Network), but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP.
Open Datasets Yes Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines. ... Datasets Three real-world social media datasets comprising multimodal UGCs: ICIP (Ortis, Farinella, and Battiato 2019), SMPD (Wu et al. 2023), and Instagram (Kim et al. 2020). ... Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP.
Dataset Splits Yes The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively.
Hardware Specification Yes In practice, for a dataset of approximately 300K UGCs, the prediction and retrieval costs of SKAPP are about 50 seconds and 7 hours, respectively, when running on a system with a 5.40GHz CPU, an NVIDIA 3090Ti GPU with 24GB memory, and 24GB DDR4 RAM at 3200MHz.
Software Dependencies No We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4.
Experiment Setup Yes The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively. We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4. The number of retrieved UGCs is 500, k1 of BM25 is 0.5, the threshold θ of selective refiner is 0, and the dimensions for embeddings v and t are 768.