Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Statistical Guarantees for Local Spectral Clustering on Random Neighborhood Graphs

Authors: Alden Green, Sivaraman Balakrishnan, Ryan J. Tibshirani

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide empirical support for our theory. Keywords: graphs, spectral clustering, Personalized Page Rank, density clustering, unsupervised learning... We provide numerical experiments to investigate the tightness of our theoretical results in Section 3, and compare the performance of PPR with a density clustering algorithm on the two moons dataset.
Researcher Affiliation Academia Alden Green EMAIL Department of Statistics and Data Science Carnegie Mellon University Pittsburgh, PA 15213 Sivaraman Balakrishnan EMAIL Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 Ryan J. Tibshirani EMAIL Department of Statistics and Data Science Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213
Pseudocode Yes Algorithm 1 PPR on a neighborhood graph Input: data X = {x1, . . . , xn}, radius r > 0, teleportation parameter α [0, 1], seed v X, sweep cut range (L, U). Output: cluster estimate b C V. 1: Form the neighborhood graph Gn,r. 2: Compute the PPR vector pv = p(v, α; Gn,r) as in (1). 3: Compute sweep cuts Sβ as in (3), for each β (L, U).2 4: Return b C = Sβ , where β = argmin β (L,U) Φ(Sβ; Gn,r).
Open Source Code No The paper does not explicitly state that source code for the described methodology is provided or give a link to a code repository. The license information refers to the paper content, not source code.
Open Datasets Yes In Figure 3, to drive home the implications of Sections 3 and 4, we compare the behavior of PPR and the density clustering algorithm of Chaudhuri and Dasgupta (2010) on the well-known two moons dataset (with added 2d Gaussian noise)... F.2 Experimental Settings for Figure 3 To form each of the three rows in Figure 3, n = 800 points are independently sampled following a two moons plus Gaussian noise model.
Dataset Splits No The paper describes how synthetic datasets (mixture of uniform distributions, two moons dataset) were generated with specific parameters and sample sizes (e.g., n=8000, n=800). However, it does not specify any training, validation, or test splits for these generated datasets, as the experiments involve simulations rather than partitioning an existing dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers (e.g., programming languages, libraries, frameworks, or solvers) that were used to conduct the experiments.
Experiment Setup Yes F.1 Experimental Settings for Figure 2 Let Rσ,ρ = [ σ/2, σ/2] [ ρ/2, ρ/2] be the two-dimensional rectangle of width σ and height ρ, centered at the origin. We sample n = 8000 points according to the density function fρ,σ,λ, defined over domain X = [ 1, 1]2 and parameterized by ρ, σ and λ as follows:... In the first row, we fix σ = .25, (λ θ)/λ = .25, and vary ρ from .25 to 2. In the second row, we fix ρ = 1.8, (λ θ)/λ = .05, and vary σ from .1 to .2 In the third row, we fix ρ = .5, σ = .25 and vary (λ θ)/λ from .1 to .25. In the first and third rows, we take r = σ/8; in the second row, where we vary σ, we take r = .1/8. F.2 Experimental Settings for Figure 3 ...µ1 = ( .5, 0), µ2 = (0, 0), ϵ N(0, I2) (row 1) µ1 = ( .5, .07), µ2 = (0, .07), ϵ N(0, I2) (row 2) µ1 = ( .5, .125), µ2 = (0, .125), ϵ N(0, I2) (row 3) for Id the d d identity matrix. In all cases σ = .07. In each case λ is taken as small as possible such that there exist exactly two distinct density clusters, which we call Cλ and C λ; r is taken as small as possible so that each vertex has at least 2 neighbors.