reproducibilityindex.ai

A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time

Authors: Ranran Shen, Pan Peng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate our theoretical bounds, we conducted experiments on synthetic networks. ...To evaluate the performance of our oracle, we conducted experiments on the random graph generated by the Stochastic Block Model (SBM). ...We conducted an experiment on a SBM graph with k = 3, n = 15000, q = 0.002, p = 0.2. ...To evaluate the running time of our oracle, we conducted this experiment on some random graphs generated by SBM with n = 3000, k = 3, q = 0.002 and p [0.02, 0.06].
Researcher Affiliation	Academia	Ranran Shen ranranshen@mail.ustc.edu.cn Pan Peng ppeng@ustc.edu.cn School of Computer Science and Technology, University of Science and Technology of China, Hefei, China.
Pseudocode	Yes	Algorithm 1: CONSTRUCTORACLE(G, k, φ, ε, γ) ... Algorithm 2: SEARCHINDEX(H, ℓ, x) ... Algorithm 3: WHICHCLUSTER(G, x) ... Algorithm 4: RUNRANDOMWALKS(G, R, t, x) ... Algorithm 5: ESTIMATETRANSITIONMATRIX(G, IS, R, t) ... Algorithm 6: ESTIMATECOLLISIONPROBABILITIES(G, IS, R, t) ... Algorithm 7: INITIALIZEORACLE(G, δ, ξ) ... Algorithm 8: SPECTRALDOTPRODUCTORACLE(G, x, y, δ, ξ, D)
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for their methodology or links to a code repository.
Open Datasets	No	The paper states: "To evaluate the performance of our oracle, we conducted experiments on the random graph generated by the Stochastic Block Model (SBM)." This indicates they generated their own data using a model, rather than using an existing publicly available dataset with a specific link or citation.
Dataset Splits	No	The paper describes generating graphs using the Stochastic Block Model and running queries on these graphs, but it does not specify explicit train/validation/test splits with percentages or sample counts for reproducibility of the data partitioning. It mentions "For each graph G = (V, E), we run WHICHCLUSTER(G, x) for all x V" which implies evaluating on the entire generated graph.
Hardware Specification	Yes	All experiments were implemented in Python 3.9 and the experiments were performed using an Intel(R) Core(TM) i7-12700F CPU @ 2.10GHz processor, with 32 GB RAM.
Software Dependencies	No	The paper states: "All experiments were implemented in Python 3.9". While Python version is provided, no specific libraries, frameworks, or other software components with their version numbers are mentioned, which would be necessary for full reproducibility.
Experiment Setup	Yes	In this model, we are given parameters p, q and n, k, where n, k denote the number of vertices and the number of clusters respectively; p denotes the probability that any pair of vertices within each cluster is connected by an edge, and q denotes the probability that any pair of vertices from different clusters is connected by an edge. ...In this experiment, we set k = 3, n = 3000, q = 0.002, p [0.02, 0.07] in the SBM, where each cluster has 1000 vertices.