reproducibilityindex.ai

Efficient Streaming Algorithms for Graphlet Sampling

Authors: Yann Bourreau, Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluation on very large graphs shows the effectiveness of our algorithms. 5 Experiments We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM. The implementation can be found in the online repository.
Researcher Affiliation	Academia	Yann Bourreau Cispa Helmholtz Center for Information Security Saarland University Saarbrücken, Germany yann.bourreau@cispa.de Marco Bressan Department of Computer Science University of Milan Milan, Italy marco.bressan@unimi.it T-H. Hubert Chan Department of Computer Science The University of Hong Kong Hong Kong, China hubert@cs.hku.hk Qipeng Kuang Department of Computer Science The University of Hong Kong Hong Kong, China kuangqipeng@connect.hku.hk Mauro Sozio Institut Polytechnique de Paris, Télécom Paris Palaiseau, France sozio@telecom-paris.fr
Pseudocode	Yes	Algorithm 1 Approximate DD Order. Algorithm 2 COMPUTEDISTRIB Algorithm 3 SAMPLE (Abstract Description) Algorithm 4 PREPROCESS Algorithm 5 CHECKEDGES Algorithm 6 SUBDEGREES Algorithm 7 SAMPLEEDGES Algorithm 8 Approximate DD order heuristically.
Open Source Code	Yes	The implementation can be found in the online repository.4 4https://github.com/l2l7l9p/UGS-Streaming
Open Datasets	Yes	We run our experiments on several real-world graphs from the KONECT website [Kun13] 5 and a synthetic random dense graph generated by drawing each edge with probability 0.8 (we name it Dense). 5http://konect.cc/networks/
Dataset Splits	No	The paper does not describe explicit train/validation/test dataset splits. The algorithms are streaming algorithms evaluated on entire graph datasets, and the experimental methodology focuses on the algorithm's performance metrics (e.g., number of passes, memory usage, number of sampling trials) rather than traditional model training and validation splits.
Hardware Specification	Yes	We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM.
Software Dependencies	No	The paper states, "We implemented our algorithms in Python 3.9". However, it does not provide specific version numbers for any other key software libraries, frameworks, or solvers beyond the Python language version itself.
Experiment Setup	Yes	We constrain the memory M by restricting the maximum number of edges that we can store during the computation of the DD order and sampling to be n/2 in all cases. In the sampling phase it is required to sample at least 100 graphlets successfully in order to reduce the error of sampling probability. We study several metrics as a function of ϵ and k, such as the number of passes, the memory usage, as well as the number of sampling trials to obtain 100 graphlets.