Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Streaming Stochastic Submodular Maximization with On-Demand User Requests

Authors: Honglian Wang, Sijing Tu, Lutz Oettershagen, Aristides Gionis

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the performance of our proposed algorithms STORM and STORM++. We denote STORM(T) and STORM(T ) as the STORM algorithm with the input parameter set to T (i.e., exact number of user accesses) and T (i.e., upper bound of user accesses), resp. We design our experiments to answer the following research questions: (Q1) How effective are the solutions provided by STORM++ and STORM(T ) compared to other baselines and to each other? (Q2) What is the impact of parameters δ and k on the performance of STORM++ and STORM(T )? (Q3) How sensitive are STORM++ and STORM(T ) to the upper bound T and do our algorithms require highly-accurate estimation of T to perform well?
Researcher Affiliation	Academia	Honglian Wang KTH Royal Institute of Technology Digital Futures Stockholm, Sweden Sijing Tu KTH Royal Institute of Technology Digital Futures Stockholm, Sweden Lutz Oettershagen University of Liverpool Liverpool, UK Aristides Gionis KTH Royal Institute of Technology Digital Futures Stockholm, Sweden
Pseudocode	Yes	Algorithm 1: STORM Input: Estimated visits T N, budget k N. 1 A1, , AT , T = {1, , T }, G0 , i 0 2 for (V, τ, p) in the stream do 3 for j T do 4 if \|Aj\| < k then 5 Aj Aj {V } ... Algorithm 2: STORM++ Input: Estimated visits T N, budget k N, parameter δ N. 1 G0 , B0 , i 0, P n δ, 2δ, . . . , T 2 for T P do Initialize STORM ( T, k) Maintain STORM for different T 3 for (V, τ, p) in the stream do 4 for T P do 5 Gi+1,( T ) STORM ( T, k).step(V, τ, p) Process (V, τ, p) by executing lines 3-15 of Alg 1 ... Algorithm 3: LMGREEDY Input: Budget k N. 1 G0, , GT , G , V1 , t = 1 2 for (V, τ, p) in the stream do 3 Vt Vt {V } 4 if τ = 1 then ...
Open Source Code	Yes	The source code and datasets are available.2 https://github.com/Hong LWang/Streaming-Submodular-maximization-with-on-demand-user-request
Open Datasets	Yes	Datasets: We use six real-world datasets, including four rating datasets Kuai Rec [14], Anime, Beer and Yahoo, and two multi-label dataset RCV1 [28] and Amazon [30]. The number of items these dataset contains vary from thousands to one million, and the topic sets size vary from around 40 to around 4000. We provide the details in Appendix D.1.
Dataset Splits	Yes	Experimental setting: To simulate the news item stream, we randomly shuffle the items to form a stream of length N. To simulate the user visit sequence, we start with an all-zeros sequence and then randomly select T indices, and set those positions to one. We establish an upper bound on the number of visits by setting T = T + T for some positive T.
Hardware Specification	Yes	All experiments ran on a Mac Book Air (Model Mac15,12) equipped with an Apple M3 chip (8-core: 4 performance and 4 efficiency cores), 16 GB of memory, and a solid-state drive (SSD).
Software Dependencies	No	The paper does not explicitly list specific software packages with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). It mentions using approaches from other papers (e.g., matrix factorization [25], STOCHASTIC-GREEDY by Mirzasoleiman et al. [35], and sampling from Feldman et al. [10]) but not the specific software implementations or versions used for these.
Experiment Setup	Yes	We design our experiments to answer the following research questions: (Q1) How effective are the solutions provided by STORM++ and STORM(T ) compared to other baselines and to each other? (Q2) What is the impact of parameters δ and k on the performance of STORM++ and STORM(T )? (Q3) How sensitive are STORM++ and STORM(T ) to the upper bound T and do our algorithms require highly-accurate estimation of T to perform well? Experimental setting: To obtain the click probabilities for each item, we generate them uniformly at random in the range [0, 0.2] for the RCV1 and Amazon datasets. For the four item-rating datasets, we estimate click probabilities by computing a low-rank completion of the user-item rating matrix using matrix factorization [25]. This yields latent feature vectors wu for each user u and vm for each item m, where the inner product w u vm approximates user u s rating for item m. We then apply standard min-max normalization to the predicted ratings and linearly scale the results to the range [0, 0.5] to obtain click probabilities. We set the latent feature dimension to 15 for the Kuai Rec dataset, and to 20 for the remaining three rating datasets. While click probability ranges can vary in real-world scenarios, where larger ranges can lead to faster convergence toward the maximum expected coverage value, we intentionally set the probabilities to be small in our experiments. This allows us to observe performance changes over a wider range of parameter settings. In all experiments, we randomly selected 50 users and report the average expected coverage for the Yahoo, RCV1, and Amazon datasets. Standard deviations and expected coverage results for the other three datasets are provided in Appendix D. We also report runtime and memory usage on the largest dataset, Amazon.