On Scalable Testing of Samplers
Authors: Yash Pote, Kuldeep S Meel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our claim by implementing our algorithm and then comparing it against Barbarik2. Our experiments on the samplers w Unigen3 and w STS, find that Barbarik3 requires 10 fewer samples for w Unigen3 and 450 fewer samples for w STS as compared to Barbarik2. 4 Evaluation To evaluate the performance of Barbarik3 and test the quality of publicly available samplers, we implemented Barbarik3 in Python. |
| Researcher Affiliation | Academia | Yash Pote r Kuldeep S. Meel School of Computing, National University of Singapore |
| Pseudocode | Yes | Algorithm 1 Barbarik3(P, Q, η, ε, δ) |
| Open Source Code | Yes | The accompanying tool, available open source, can be found at https://github.com/meelgroup/barbarik |
| Open Datasets | Yes | Our dataset consists of the union of two n-dimensional product distributions, for n {4, 7, 10, . . . , 118}. We experiment on 87 constraints drawn from a collection of publicly available benchmarks arising from sampling and counting tasks5. Footnote 5: https://zenodo.org/record/3793090 |
| Dataset Splits | No | The paper describes datasets used for evaluation but does not specify train/validation/test splits, as the experiments involve testing pre-existing samplers rather than training new models. |
| Hardware Specification | Yes | Our experiments were conducted on a high-performance compute cluster with Intel Xeon(R) E52690v3@2.60GHz CPU cores. We use a single core with 4GB memory with a timeout of 16 hours for each benchmark. |
| Software Dependencies | No | We implemented Barbarik3 in Python. |
| Experiment Setup | Yes | For the closeness(ε), farness(η), and confidence(δ) parameters, we choose the values 0.05, 0.9 and 0.2. This setting implies that for a given distribution P, and for a given sampler Q(ϕ, w), Barbarik3 returns (1) Accept if d (P, Q(ϕ, w)) < 0.05, and (2) Reject if d T V (P, Q(ϕ, w)) > 0.9, with probability at least 0.8. We set a sample limit of 108 samples for our experiments due to our limited computational resources. |