On Scalable Testing of Samplers

Authors: Yash Pote, Kuldeep S Meel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our claim by implementing our algorithm and then comparing it against Barbarik2. Our experiments on the samplers w Unigen3 and w STS, find that Barbarik3 requires 10 fewer samples for w Unigen3 and 450 fewer samples for w STS as compared to Barbarik2. 4 Evaluation To evaluate the performance of Barbarik3 and test the quality of publicly available samplers, we implemented Barbarik3 in Python.
Researcher Affiliation Academia Yash Pote r Kuldeep S. Meel School of Computing, National University of Singapore
Pseudocode Yes Algorithm 1 Barbarik3(P, Q, η, ε, δ)
Open Source Code Yes The accompanying tool, available open source, can be found at https://github.com/meelgroup/barbarik
Open Datasets Yes Our dataset consists of the union of two n-dimensional product distributions, for n {4, 7, 10, . . . , 118}. We experiment on 87 constraints drawn from a collection of publicly available benchmarks arising from sampling and counting tasks5. Footnote 5: https://zenodo.org/record/3793090
Dataset Splits No The paper describes datasets used for evaluation but does not specify train/validation/test splits, as the experiments involve testing pre-existing samplers rather than training new models.
Hardware Specification Yes Our experiments were conducted on a high-performance compute cluster with Intel Xeon(R) E52690v3@2.60GHz CPU cores. We use a single core with 4GB memory with a timeout of 16 hours for each benchmark.
Software Dependencies No We implemented Barbarik3 in Python.
Experiment Setup Yes For the closeness(ε), farness(η), and confidence(δ) parameters, we choose the values 0.05, 0.9 and 0.2. This setting implies that for a given distribution P, and for a given sampler Q(ϕ, w), Barbarik3 returns (1) Accept if d (P, Q(ϕ, w)) < 0.05, and (2) Reject if d T V (P, Q(ϕ, w)) > 0.9, with probability at least 0.8. We set a sample limit of 108 samples for our experiments due to our limited computational resources.