reproducibilityindex.ai

On Scalable Testing of Samplers

Authors: Yash Pote, Kuldeep S Meel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our claim by implementing our algorithm and then comparing it against Barbarik2. Our experiments on the samplers w Unigen3 and w STS, ﬁnd that Barbarik3 requires 10 fewer samples for w Unigen3 and 450 fewer samples for w STS as compared to Barbarik2. 4 Evaluation To evaluate the performance of Barbarik3 and test the quality of publicly available samplers, we implemented Barbarik3 in Python.
Researcher Affiliation	Academia	Yash Pote r Kuldeep S. Meel School of Computing, National University of Singapore
Pseudocode	Yes	Algorithm 1 Barbarik3(P, Q, η, ε, δ)
Open Source Code	Yes	The accompanying tool, available open source, can be found at https://github.com/meelgroup/barbarik
Open Datasets	Yes	Our dataset consists of the union of two n-dimensional product distributions, for n {4, 7, 10, . . . , 118}. We experiment on 87 constraints drawn from a collection of publicly available benchmarks arising from sampling and counting tasks5. Footnote 5: https://zenodo.org/record/3793090
Dataset Splits	No	The paper describes datasets used for evaluation but does not specify train/validation/test splits, as the experiments involve testing pre-existing samplers rather than training new models.
Hardware Specification	Yes	Our experiments were conducted on a high-performance compute cluster with Intel Xeon(R) E52690v3@2.60GHz CPU cores. We use a single core with 4GB memory with a timeout of 16 hours for each benchmark.
Software Dependencies	No	We implemented Barbarik3 in Python.
Experiment Setup	Yes	For the closeness(ε), farness(η), and conﬁdence(δ) parameters, we choose the values 0.05, 0.9 and 0.2. This setting implies that for a given distribution P, and for a given sampler Q(ϕ, w), Barbarik3 returns (1) Accept if d (P, Q(ϕ, w)) < 0.05, and (2) Reject if d T V (P, Q(ϕ, w)) > 0.9, with probability at least 0.8. We set a sample limit of 108 samples for our experiments due to our limited computational resources.