Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hitting and Commute Times in Large Random Neighborhood Graphs

Authors: Ulrike von Luxburg, Agnes Radl, Matthias Hein

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we examine the convergence behavior of the commute distance in practice. We ran a large number of simulations, both on artificial and real world data sets, in order to evaluate whether the rescaled commute distance in a graph is close to its predicted limit expression 1/du + 1/dv.We conducted simulations for artificial graphs (random geometric graphs, planted partition graphs, preferential attachment graphs) and real world data sets of various types and sizes (social networks, biological networks, traffic networks; up to 1.6 million vertices and 22 million edges). The general setup is as follows. Given a graph, we compute the pairwise resistance distance Rij between all points. The relative deviation between resistance distance and predicted result is then given by Rel Dev(i, j) := |Rij 1/di 1/dj|. Note that we report relative rather than absolute deviations because this is more meaningful if Rij is strongly fluctuating on the graph. It also allows to compare the behavior of different graphs with each other. In all figures, we then report the maximum, mean and median relative deviations.
Researcher Affiliation Academia Ulrike von Luxburg EMAIL Department of Computer Science University of Hamburg Vogt-Koelln-Str. 30, 22527 Hamburg, Germany Agnes Radl EMAIL Institute of Mathematics University of Leipzig Augustusplatz 10, 04109 Leipzig, Germany Matthias Hein EMAIL Department of Mathematics and Computer Science Saarland University Campus E1 1, 66123 Saarbr ucken, Germany
Pseudocode No The paper describes methods and proofs using mathematical notation and natural language, such as in "Construction of the flow overview" (Section 6.2). However, it does not contain any clearly labeled pseudocode blocks or algorithms formatted like code.
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We start with the class of random geometric graphs, which is very important for machine learning. We use a mixture of two Gaussian distributions on Rd. The first two dimensions contain a two-dimensional mixture of two Gaussians with varying separation (centers ( sep/2, 0) and (+sep/2, 0), covariance matrix 0.2 Id, mixing weights 0.5 for both Gaussians). The remaining d 2 dimensions contain Gaussian noise with variance 0.2 as well. From this distribution we draw n sample points. Based on this sample, we either compute the unweighted symmetric k NN graph, the unweighted ε-graphs or the Gaussian similarity graph. ... Now we consider the deviations for a couple of real world data sets. We start with similarity graphs on real data, as they are often used in machine learning. As example we use the full USPS data set of handwritten digits (9298 points in 256 dimensions) and consider the different forms of similarity graphs (k NN, ε, Gaussian) with varying connectivity parameter. ... Figure 5: Relative Error of the approximation in real world networks. Downloaded from: 1, http://deim.urv.cat/~aarenas/data/welcome.htm and 2, http://www.cise.ufl.edu/research/sparse/matrices/. See text for details.
Dataset Splits No The paper describes graph generation parameters for artificial datasets and sampling strategies for real-world datasets (e.g., "draw 20 vertices at random"), but it does not specify explicit training/validation/test splits, which are typically used for model evaluation in machine learning tasks.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the simulations or experiments.
Software Dependencies No The paper does not list any specific software dependencies or their version numbers that were used for the implementation or analysis.
Experiment Setup Yes We use a mixture of two Gaussian distributions on Rd. The first two dimensions contain a two-dimensional mixture of two Gaussians with varying separation (centers ( sep/2, 0) and (+sep/2, 0), covariance matrix 0.2 Id, mixing weights 0.5 for both Gaussians). The remaining d 2 dimensions contain Gaussian noise with variance 0.2 as well. From this distribution we draw n sample points. Based on this sample, we either compute the unweighted symmetric k NN graph, the unweighted ε-graphs or the Gaussian similarity graph. In order to be able to compare the results between these three types of graphs we match the parameters of the different graphs: given some value k for the k NN-graph we choose the values of ε for the ε-graph and σ for the Gaussian graph as the maximal k-nearest neighbor distance in the data set. ... Next we consider graphs according to the planted partition model. We modeled a graph with n vertices and two equal sized clusters with connectivity parameters pwithin and pbetween. ... In our simulation we generated preferential attachment graphs according to the following standard procedure: Starting with a graph that consists of two vertices connected by an edge, in each time step we add a new vertex to the graph. The new vertex is connected by a fixed number of edges to existing vertices (this number is called Num Links in the figures below). The target vertices of these edges are chosen randomly among all existing vertices, where the probability to connect to a particular vertex is proportional to its degree.