Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

New Parallel and Streaming Algorithms for Directed Densest Subgraph

Authors: Slobodan Mitrovic, Theodore Pan, Mahdi Qaempanah, Mohammad Amin Raeisi

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our approaches in two ways. First, we illustrate that our single-pass semi-streaming algorithm performs much better than the theoretical guarantee. Specifically, its approximation on temporal datasets matches the (2 + ε)-approximation of an O(log n)-pass algorithm by Bahmani et al. [BKV12, VLDB 2012]. Second, we demonstrate that our MPC algorithm requires fewer rounds than prior work. 6 Experiments Baselines. We empirically evaluate the performance of our semi-streaming algorithm, comparing it to the (2+ε)-approximation O(log n) pass semi-streaming algorithm from [BKV12]. ... In Figure 2, we see that our algorithm matches [BKV12] for the largest density. We also observe that it is significantly less sensitive to error in z2, making it better in practice than [BKV12] when our approximation of z2 may not be as precise. Overall, we demonstrate that in practice our algorithm performs much better than the log n theoretical guarantee would imply. Figure 2: Density as a function of z2 for various temporal datasets. Figure 3: Update time between processing edges of the stream for various general datasets.
Researcher Affiliation	Academia	Slobodan Mitrovi c* University of California, Davis EMAIL Theodore Pan* University of California, Davis EMAIL Mahdi Qaempanah* Sharif University of Technology EMAIL Mohammad Amin Raeisi* Yale University EMAIL
Pseudocode	Yes	Algorithm 1 Computes a 2(1 + ε)-approximate directed densest subgraph assuming that D = ρ(S , T ) and z = p \|S \|/\|T \| Input: bipartite graph G = (S, T, E), parameters ε > 0, D and z 1: k S = D/(2z), k T = Dz/2 2: while true do 3: A all vertices v S with d G(v) < k S 4: B all vertices v T with d G(v) < k T 5: if \|S\| z2\|T\| and \|A\| ε 1+ε\|S\| then return (S, T) 6: if \|S\| z2\|T\| and \|B\| ε 1+ε\|T\| then return (S, T) 7: Remove A from S and B from T Algorithm 2 Finds partial (2 + ε)-approximation of directed densest subgraph Input: bipartite graph G = (S, T, E), ε (0, 1), δ (0, 1), D and z Algorithm 3 Finds O(log n)-approximation of directed densest subgraph Input: bipartite graph G = (S, T, E), ε > 0, D and z
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide access to the data and code in the supplemental material.
Open Datasets	Yes	Data. We use 10 datasets from the Stanford Large Network Dataset Collection [LK14]. [LK14] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
Dataset Splits	No	The paper describes how the data is handled (e.g., temporal graphs where edges are in sorted time order, general directed graphs with edges sorted by endpoints) but does not specify explicit training/test/validation splits. The algorithms are evaluated on the full datasets in a streaming context.
Hardware Specification	Yes	To run these experiments, we use an M1 machine running mac OS Sequoia 15.3.2 with 4 cores, 8 GB of RAM, 256 KB of L2 Cache, and 2.5 MB of L3 Cache (per core).
Software Dependencies	No	The paper mentions 'mac OS Sequoia 15.3.2' as the operating system, but does not specify other software dependencies, libraries, or frameworks with version numbers used for implementing or running the algorithms.
Experiment Setup	Yes	Results. We run the algorithms on temporal directed graphs with ε = 0.2. We also compare the update time of our algorithm to [MP24], using their threshold f = 1/450, in Figure 3. The chosen c are where the largest densities are attained, and the chosen ε are based on maximizing densities. We compare our MPC algorithm to the MPC algorithm for directed graphs from [BKV12]. Specifically, we look at their density plots as well as how many sublinear MPC rounds the algorithms take. We use δ = 0.4, 0.6, 0.8, where each machine has nδ memory, and ε = 0.6, the approximation parameter for both algorithms.