Efficient Streaming Algorithms for Graphlet Sampling

Authors: Yann Bourreau, Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluation on very large graphs shows the effectiveness of our algorithms. 5 Experiments We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM. The implementation can be found in the online repository.
Researcher Affiliation Academia Yann Bourreau Cispa Helmholtz Center for Information Security Saarland University Saarbrücken, Germany yann.bourreau@cispa.de Marco Bressan Department of Computer Science University of Milan Milan, Italy marco.bressan@unimi.it T-H. Hubert Chan Department of Computer Science The University of Hong Kong Hong Kong, China hubert@cs.hku.hk Qipeng Kuang Department of Computer Science The University of Hong Kong Hong Kong, China kuangqipeng@connect.hku.hk Mauro Sozio Institut Polytechnique de Paris, Télécom Paris Palaiseau, France sozio@telecom-paris.fr
Pseudocode Yes Algorithm 1 Approximate DD Order. Algorithm 2 COMPUTEDISTRIB Algorithm 3 SAMPLE (Abstract Description) Algorithm 4 PREPROCESS Algorithm 5 CHECKEDGES Algorithm 6 SUBDEGREES Algorithm 7 SAMPLEEDGES Algorithm 8 Approximate DD order heuristically.
Open Source Code Yes The implementation can be found in the online repository.4 4https://github.com/l2l7l9p/UGS-Streaming
Open Datasets Yes We run our experiments on several real-world graphs from the KONECT website [Kun13] 5 and a synthetic random dense graph generated by drawing each edge with probability 0.8 (we name it Dense). 5http://konect.cc/networks/
Dataset Splits No The paper does not describe explicit train/validation/test dataset splits. The algorithms are streaming algorithms evaluated on entire graph datasets, and the experimental methodology focuses on the algorithm's performance metrics (e.g., number of passes, memory usage, number of sampling trials) rather than traditional model training and validation splits.
Hardware Specification Yes We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM.
Software Dependencies No The paper states, "We implemented our algorithms in Python 3.9". However, it does not provide specific version numbers for any other key software libraries, frameworks, or solvers beyond the Python language version itself.
Experiment Setup Yes We constrain the memory M by restricting the maximum number of edges that we can store during the computation of the DD order and sampling to be n/2 in all cases. In the sampling phase it is required to sample at least 100 graphlets successfully in order to reduce the error of sampling probability. We study several metrics as a function of ϵ and k, such as the number of passes, the memory usage, as well as the number of sampling trials to obtain 100 graphlets.