Private and Non-private Uniformity Testing for Ranking Data

Authors: Róbert Busa-Fekete, Dimitris Fotakis, Emmanouil Zampetakis

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carry out large-scale experiments, including m = 10, 000, to show that our uniformity testing algorithms scale gracefully with m. and 7 Experiments We shall present synthetic experiments to assess the performance of the proposed tests.
Researcher Affiliation Collaboration Róbert Busa-Fekete Google Research, New York, USA busarobi@google.com Dimitris Fotakis National Technical University of Athens, Greece fotakis@cs.ntua.gr Manolis Zampetakis University of California, Berkeley, USA mzampet@berkeley.edu
Pseudocode Yes Algorithm 1 2SAMP: Uniformity Test with Two Samples, Algorithm 2 Uniformity Test (UNIF), Algorithm 3 Central DP Uniformity Test (TRUN), Algorithm 5, Algorithm 6
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No We shall present synthetic experiments to assess the performance of the proposed tests. and We used synthetic data. No information provided for public access to the synthetic data itself or the exact generation process to reproduce it as a dataset.
Dataset Splits No The paper discusses sample complexity for statistical tests and uses synthetic data, but does not mention any training, validation, or test dataset splits.
Hardware Specification No Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We used data centers to compute the experiments. I believe that it is not so relevant to this work how long the computation did take.
Software Dependencies No The paper does not provide specific software dependencies or version numbers for the key software components used in the experiments.
Experiment Setup Yes Every testing algorithm we presented has a tolerance parameter and significance δ. We used δ = 0.05 in every case. The tolerance parameter does have impact only on the sample size of the testing algorithms. Instead of setting to a certain value, we plotted the power of the algorithms with various sample size. In this way, we could compare the performance of the testing algorithms based on the same number of samples as input. Each result we report here are computed based on 1000 repetitions. The central ranking of each model which the random samples are generated from, is selected uniformly at random in each each run independently.