Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improving Multimodal Social Media Popularity Prediction via Selective Retrieval Knowledge Augmentation
Authors: Xovee Xu, Yifan Zhang, Fan Zhou, Jingkuan Song
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines. |
| Researcher Affiliation | Academia | University of Electronic Science and Technology of China, Chengdu, Sichuan 610054 China EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology with figures (Figure 2) and detailed text descriptions for each component (Meta Retriever, Selective Refiner, Knowledge-Augmented Prediction Network), but does not include any explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP. |
| Open Datasets | Yes | Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines. ... Datasets Three real-world social media datasets comprising multimodal UGCs: ICIP (Ortis, Farinella, and Battiato 2019), SMPD (Wu et al. 2023), and Instagram (Kim et al. 2020). ... Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP. |
| Dataset Splits | Yes | The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively. |
| Hardware Specification | Yes | In practice, for a dataset of approximately 300K UGCs, the prediction and retrieval costs of SKAPP are about 50 seconds and 7 hours, respectively, when running on a system with a 5.40GHz CPU, an NVIDIA 3090Ti GPU with 24GB memory, and 24GB DDR4 RAM at 3200MHz. |
| Software Dependencies | No | We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4. |
| Experiment Setup | Yes | The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively. We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4. The number of retrieved UGCs is 500, k1 of BM25 is 0.5, the threshold θ of selective refiner is 0, and the dimensions for embeddings v and t are 768. |