Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ghidorah: Towards Robust Multi-Scale Information Diffusion Prediction via Test-Time Training

Authors: Wenting Zhu, Chaozhuo Li, Litian Zhang, Senzhang Wang, Xi Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across several benchmark datasets validate the superiority of our approach. [...] Experimental Settings Datasets Following previous works (Yang et al. 2021; Sun et al. 2022), we evaluate the proposed framework on four datasets collected from real-world platforms: Christianity, Android, Memetracker (Jiao et al. 2024), and Douban. [...] Performance Comparison Table 2 and Table 3 report the results for microscopic prediction, while Table 4 summarizes the results for macroscopic prediction. [...] Ablation Study We conduct a series of ablation studies on the Christianity and Douban datasets to evaluate the importance of each module within Ghidorah. [...] Hyperparameter Sensitivity Analysis Mask Ratio pm in MAE [...] Auxiliary Task Loss Weight α [...] Number of Constructed Environment F [...] Gradient Steps δ during Test-Time Training
Researcher Affiliation	Academia	1Key Laboratory of Trustworthy Distributed Computing and Service (Mo E), Beijing University of Posts and Telecommunications, China 2School of Cyber Science and Technology, Beihang University, China 3School of Computer Science and Engineering, Central South University, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology and workflow in detail with figures and equations, but does not include a distinct pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about open-sourcing the code or a link to a code repository.
Open Datasets	Yes	Following previous works (Yang et al. 2021; Sun et al. 2022), we evaluate the proposed framework on four datasets collected from real-world platforms: Christianity, Android, Memetracker (Jiao et al. 2024), and Douban.
Dataset Splits	Yes	We randomly sample 80% of the cascades for training, 10% for validation, and the remaining 10% for testing.
Hardware Specification	No	The paper states: "Our model is implemented in Py Torch. The results for Ghidorah are presented as the mean of five runs to ensure reliable evaluation." However, it does not specify any hardware details like CPU, GPU models, or memory.
Software Dependencies	No	Our model is implemented in Py Torch. The paper mentions PyTorch but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	All hyperparameters are selected through a grid search algorithm based on validation set performance, with final results reported on the test set. [...] Mask Ratio pm in MAE [...] Ghidorah performs best with a mask ratio of 0.4, which masks 40% of users. [...] Auxiliary Task Loss Weight α [...] Optimal performance is achieved when α is set to 0.5. [...] Gradient Steps δ during Test-Time Training [...] performance improves with an increasing number of steps, peaks at 15 gradient updates, and then begins to decline.