Sharp Impossibility Results for Hyper-graph Testing

Authors: Jiashun Jin, Zheng Tracy Ke, Jiajun Liang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use simulated data to validate our theoretical results. Fix (n, K, m) = (500, 2, 3). In Experiment 1, we consider the SBM model and verify that the Regions of Impossibility are different for symmetric and asymmetric SBM (see Section 2.4). Let θi = n 1/2 for 1 i n, and Pijk = 1 if i = j = k and Pijk = 1/4 otherwise. We consider a symmetric case where each communities have 250 nodes and an asymmetric case where two communities have 375 and 125 nodes, respectively. For each setting, we randomly generate the hypergraphs, apply the degree-based χ2-statistic ψn in Section 2.4, and repeat for 500 times. The histograms of ψn for two cases are on the left panel of the figure below (green: symmetric alternative; red: asymmetric alternative; blue: density of N(0, 1)).
Researcher Affiliation Academia Jiashun Jin Carnegie Mellon University jiashun@stat.cmu.eduZheng Tracy Ke Harvard University zke@fas.harvard.eduJiajun Liang Purdue University liangjj@purdue.edu
Pseudocode No The paper describes algorithms and mathematical formulations for its proposed tests (e.g., in Section 3.1, computing η(m) and ϕn), but it does not present them in a structured pseudocode block or a clearly labeled algorithm section.
Open Source Code No The paper does not provide any explicit statement about releasing source code for the methodology or include links to code repositories.
Open Datasets No The paper uses simulated data for its experiments, as stated in Section 4: "We use simulated data to validate our theoretical results." It does not mention or provide access to any publicly available or open datasets.
Dataset Splits No The paper uses simulated data and describes the parameters of the simulation (e.g., N=500, K=2, M=3, node community assignments, θ values). However, it does not explicitly provide information about training, validation, or test splits, as would be typical for experiments on real-world datasets.
Hardware Specification No The paper mentions "simulated data" for its numerical studies but does not specify any particular hardware (e.g., GPU models, CPU types, memory, or cloud instances) used to run these simulations.
Software Dependencies No The paper does not provide specific names and version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes Fix (n, K, m) = (500, 2, 3). In Experiment 1, we consider the SBM model and verify that the Regions of Impossibility are different for symmetric and asymmetric SBM (see Section 2.4). Let θi = n 1/2 for 1 i n, and Pijk = 1 if i = j = k and Pijk = 1/4 otherwise. We consider a symmetric case where each communities have 250 nodes and an asymmetric case where two communities have 375 and 125 nodes, respectively. For each setting, we randomly generate the hypergraphs, apply the degree-based χ2-statistic ψn in Section 2.4, and repeat for 500 times.