Efficient Streaming Algorithms for Graphlet Sampling
Authors: Yann Bourreau, Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluation on very large graphs shows the effectiveness of our algorithms. 5 Experiments We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM. The implementation can be found in the online repository. |
| Researcher Affiliation | Academia | Yann Bourreau Cispa Helmholtz Center for Information Security Saarland University Saarbrücken, Germany yann.bourreau@cispa.de Marco Bressan Department of Computer Science University of Milan Milan, Italy marco.bressan@unimi.it T-H. Hubert Chan Department of Computer Science The University of Hong Kong Hong Kong, China hubert@cs.hku.hk Qipeng Kuang Department of Computer Science The University of Hong Kong Hong Kong, China kuangqipeng@connect.hku.hk Mauro Sozio Institut Polytechnique de Paris, Télécom Paris Palaiseau, France sozio@telecom-paris.fr |
| Pseudocode | Yes | Algorithm 1 Approximate DD Order. Algorithm 2 COMPUTEDISTRIB Algorithm 3 SAMPLE (Abstract Description) Algorithm 4 PREPROCESS Algorithm 5 CHECKEDGES Algorithm 6 SUBDEGREES Algorithm 7 SAMPLEEDGES Algorithm 8 Approximate DD order heuristically. |
| Open Source Code | Yes | The implementation can be found in the online repository.4 4https://github.com/l2l7l9p/UGS-Streaming |
| Open Datasets | Yes | We run our experiments on several real-world graphs from the KONECT website [Kun13] 5 and a synthetic random dense graph generated by drawing each edge with probability 0.8 (we name it Dense). 5http://konect.cc/networks/ |
| Dataset Splits | No | The paper does not describe explicit train/validation/test dataset splits. The algorithms are streaming algorithms evaluated on entire graph datasets, and the experimental methodology focuses on the algorithm's performance metrics (e.g., number of passes, memory usage, number of sampling trials) rather than traditional model training and validation splits. |
| Hardware Specification | Yes | We implemented our algorithms in Python 3.9 and conducted experiments on a Ubuntu server with CPU Intel Xeon Silver 4108 (1.80GHz) and 28GB RAM. |
| Software Dependencies | No | The paper states, "We implemented our algorithms in Python 3.9". However, it does not provide specific version numbers for any other key software libraries, frameworks, or solvers beyond the Python language version itself. |
| Experiment Setup | Yes | We constrain the memory M by restricting the maximum number of edges that we can store during the computation of the DD order and sampling to be n/2 in all cases. In the sampling phase it is required to sample at least 100 graphlets successfully in order to reduce the error of sampling probability. We study several metrics as a function of ϵ and k, such as the number of passes, the memory usage, as well as the number of sampling trials to obtain 100 graphlets. |