Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learnable Spatial-Temporal Positional Encoding for Link Prediction

Authors: Katherine Tieu, Dongqi Fu, Zihao Li, Ross Maciejewski, Jingrui He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we evaluate the empirical performance of L-STEP extensively, including (1) the comprehensive effectiveness comparison with 13 classic datasets, 10 algorithms, 2 learning settings (transductive and inductive), and 3 sampling strategies, where our L-STEP performs the best across the board, (2) the verification of the role of MLPs and transformers upon L-STEP, (3) the learning robustness of L-STEP in terms of different initial positional encoding inputs, (4) parameter analysis, running time comparison, and ablation studies, and (5) the leading performance in the large-scale TGB open benchmark (Gastinger et al., 2024).
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign 2Meta AI 3Arizona State University. Correspondence to: Jingrui He <EMAIL>.
Pseudocode	Yes	Appendix B: Pseudo-code of L-STEP
Open Source Code	Yes	Our code is available at https: //github.com/kthrn22/L-STEP.
Open Datasets	Yes	We assess the ability of L-STEP in performing link prediction with 13 datasets covering various domains and collected by (Poursafaei et al., 2022): Wikipedia, Reddit, MOOC, Last FM, Enron, Social Evo., UCI, Flights, Can. Parl., US Legis., UN Trade, UN Vote, and Contact. Details about the dataset statistics are shown in Appendix G. We compare L-STEP with 8 state-of-the-art baselines... The large-scale datasets with pre-defined splits are publicly available at TGB Benchmark (Gastinger et al., 2024)
Dataset Splits	Yes	Across all 13 datasets, training/validation/testing sets are following the standard library (Yu et al., 2023) by chronological splits with ratios 70%/15%/15%.
Hardware Specification	Yes	The experiments are coded by Python and are performed on a Linux machine with a single NVIDIA Tesla V100 32GB GPU.
Software Dependencies	No	The experiments are coded by Python and are performed on a Linux machine with a single NVIDIA Tesla V100 32GB GPU. (No specific Python version or library versions mentioned.)
Experiment Setup	Yes	We first report the configuration and hyper-parameters that are unchanged for all 13 datasets: Dimension of time encoding: d T = 100. Dimension of node encoding: d N = 172. Dimension of edge encoding: d E = 172. Dimension of positional encoding: d P = 172 (only for Social Evo., d P = 72). Hyper-parameters for time encoding function: α = 10, β = 10. Weight of negative samples in positional encoding loss: αneg = 0.3. Weight of positional encoding loss in objective loss function of L-STEP αpe = 0.5. ... We leverage the Adam optimizer with learning rate of 0.0001. For a more detailed description of the model s implementation, computational resources, and configurations of hyper-parameters over all 13 datasets, we refer readers to Appendix I.2, I.3. ... We run L-STEP 5 times with different random seeds from 0 to 4 and report the average metric score.