A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time

Authors: Ranran Shen, Pan Peng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our theoretical bounds, we conducted experiments on synthetic networks. ...To evaluate the performance of our oracle, we conducted experiments on the random graph generated by the Stochastic Block Model (SBM). ...We conducted an experiment on a SBM graph with k = 3, n = 15000, q = 0.002, p = 0.2. ...To evaluate the running time of our oracle, we conducted this experiment on some random graphs generated by SBM with n = 3000, k = 3, q = 0.002 and p [0.02, 0.06].
Researcher Affiliation Academia Ranran Shen ranranshen@mail.ustc.edu.cn Pan Peng ppeng@ustc.edu.cn School of Computer Science and Technology, University of Science and Technology of China, Hefei, China.
Pseudocode Yes Algorithm 1: CONSTRUCTORACLE(G, k, φ, ε, γ) ... Algorithm 2: SEARCHINDEX(H, ℓ, x) ... Algorithm 3: WHICHCLUSTER(G, x) ... Algorithm 4: RUNRANDOMWALKS(G, R, t, x) ... Algorithm 5: ESTIMATETRANSITIONMATRIX(G, IS, R, t) ... Algorithm 6: ESTIMATECOLLISIONPROBABILITIES(G, IS, R, t) ... Algorithm 7: INITIALIZEORACLE(G, δ, ξ) ... Algorithm 8: SPECTRALDOTPRODUCTORACLE(G, x, y, δ, ξ, D)
Open Source Code No The paper does not provide any explicit statements about releasing source code for their methodology or links to a code repository.
Open Datasets No The paper states: "To evaluate the performance of our oracle, we conducted experiments on the random graph generated by the Stochastic Block Model (SBM)." This indicates they generated their own data using a model, rather than using an existing publicly available dataset with a specific link or citation.
Dataset Splits No The paper describes generating graphs using the Stochastic Block Model and running queries on these graphs, but it does not specify explicit train/validation/test splits with percentages or sample counts for reproducibility of the data partitioning. It mentions "For each graph G = (V, E), we run WHICHCLUSTER(G, x) for all x V" which implies evaluating on the entire generated graph.
Hardware Specification Yes All experiments were implemented in Python 3.9 and the experiments were performed using an Intel(R) Core(TM) i7-12700F CPU @ 2.10GHz processor, with 32 GB RAM.
Software Dependencies No The paper states: "All experiments were implemented in Python 3.9". While Python version is provided, no specific libraries, frameworks, or other software components with their version numbers are mentioned, which would be necessary for full reproducibility.
Experiment Setup Yes In this model, we are given parameters p, q and n, k, where n, k denote the number of vertices and the number of clusters respectively; p denotes the probability that any pair of vertices within each cluster is connected by an edge, and q denotes the probability that any pair of vertices from different clusters is connected by an edge. ...In this experiment, we set k = 3, n = 3000, q = 0.002, p [0.02, 0.07] in the SBM, where each cluster has 1000 vertices.