Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Multimodal Social Media Popularity Prediction via Selective Retrieval Knowledge Augmentation

Authors: Xovee Xu, Yifan Zhang, Fan Zhou, Jingkuan Song

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines.
Researcher Affiliation	Academia	University of Electronic Science and Technology of China, Chengdu, Sichuan 610054 China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology with figures (Figure 2) and detailed text descriptions for each component (Meta Retriever, Selective Refiner, Knowledge-Augmented Prediction Network), but does not include any explicit pseudocode blocks or algorithms labeled as such.
Open Source Code	Yes	Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP.
Open Datasets	Yes	Extensive experiments on three large-scale social media datasets demonstrate significant improvements ranging from 26.68% to 48.19% across all metrics compared to strong baselines. ... Datasets Three real-world social media datasets comprising multimodal UGCs: ICIP (Ortis, Farinella, and Battiato 2019), SMPD (Wu et al. 2023), and Instagram (Kim et al. 2020). ... Source codes and datasets are available at https://github.com/Yifan Zhang-git/SKAPP.
Dataset Splits	Yes	The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively.
Hardware Specification	Yes	In practice, for a dataset of approximately 300K UGCs, the prediction and retrieval costs of SKAPP are about 50 seconds and 7 hours, respectively, when running on a system with a 5.40GHz CPU, an NVIDIA 3090Ti GPU with 24GB memory, and 24GB DDR4 RAM at 3200MHz.
Software Dependencies	No	We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4.
Experiment Setup	Yes	The dataset split ratio is 8:1:1 for training, validation, and test sets, respectively. We use PyTorch to implement the SKAPP model, with Adam optimizer and an initial learning rate of 1e-4. The number of retrieved UGCs is 500, k1 of BM25 is 0.5, the threshold θ of selective refiner is 0, and the dimensions for embeddings v and t are 768.