Practical Methods for Graph Two-Sample Testing

Authors: Debarghya Ghoshdastidar, Ulrike von Luxburg

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we consider the problem of two-sample testing of large graphs. We demonstrate the practical merits and limitations of existing theoretical tests and their bootstrapped variants. We also propose two new tests based on asymptotic distributions. We show that these tests are computationally less expensive and, in some cases, more reliable than the existing methods. ... We present our numerical results in three groups: (i) results for random graphs for m > 1, (ii) results for random graphs for m = 1, and (iii) results for testing real networks.
Researcher Affiliation Academia Debarghya Ghoshdastidar Department of Computer Science University of Tübingen ghoshdas@informatik.uni-tuebingen.de Ulrike von Luxburg Department of Computer Science University of Tübingen Max Planck Institute for Intelligent Systems luxburg@informatik.uni-tuebingen.de
Pseudocode No The paper refers to Algorithm 1 in Tang et al. (2016) but does not provide its own pseudocode or algorithm blocks.
Open Source Code Yes Our aim is also to make the existing and proposed tests more accessible for applied research. We provide Matlab implementations of the tests in the supplementary material. ... Matlab codes are provided in the supplementary.1 Also available at: https://github.com/gdebarghya/Network-Two Sample Testing.
Open Datasets Yes In the first setup, we consider moderate sized graphs (n = 178) constructed by thresholding autocorrelation matrices of EEG recordings (Andrzejak et al., 2001, Dua and Taniskidou, 2017). ... We also study networks corresponding to peering information of autonomous systems, that is, graphs defined on the routers comprising the Internet with the edges representing who-talks-to-whom (Leskovec et al., 2005, Leskovec and Krevl, 2014).
Dataset Splits No The paper does not provide specific details on how datasets are split into training, validation, or test sets for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies No The paper mentions 'Matlab implementations' but does not specify version numbers for Matlab or any other software libraries or dependencies.
Experiment Setup Yes We let n grow from 100 to 1000 in steps of 100, and set p = 0.1 and q = 0.05. We set ϵ = 0 and 0.04 for null and alternative hypotheses, respectively. We use two values of population size, m {2, 4}, and fix the significance level at α = 5%. ... We use r {2, 4} to specify the rank parameter for bootstrap tests, and also as the number of blocks used for community detection step of Asymp-TW.