reproducibilityindex.ai

Private and Non-private Uniformity Testing for Ranking Data

Authors: Róbert Busa-Fekete, Dimitris Fotakis, Emmanouil Zampetakis

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carry out large-scale experiments, including m = 10, 000, to show that our uniformity testing algorithms scale gracefully with m. and 7 Experiments We shall present synthetic experiments to assess the performance of the proposed tests.
Researcher Affiliation	Collaboration	Róbert Busa-Fekete Google Research, New York, USA busarobi@google.com Dimitris Fotakis National Technical University of Athens, Greece fotakis@cs.ntua.gr Manolis Zampetakis University of California, Berkeley, USA mzampet@berkeley.edu
Pseudocode	Yes	Algorithm 1 2SAMP: Uniformity Test with Two Samples, Algorithm 2 Uniformity Test (UNIF), Algorithm 3 Central DP Uniformity Test (TRUN), Algorithm 5, Algorithm 6
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	No	We shall present synthetic experiments to assess the performance of the proposed tests. and We used synthetic data. No information provided for public access to the synthetic data itself or the exact generation process to reproduce it as a dataset.
Dataset Splits	No	The paper discusses sample complexity for statistical tests and uses synthetic data, but does not mention any training, validation, or test dataset splits.
Hardware Specification	No	Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We used data centers to compute the experiments. I believe that it is not so relevant to this work how long the computation did take.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers for the key software components used in the experiments.
Experiment Setup	Yes	Every testing algorithm we presented has a tolerance parameter and signiﬁcance δ. We used δ = 0.05 in every case. The tolerance parameter does have impact only on the sample size of the testing algorithms. Instead of setting to a certain value, we plotted the power of the algorithms with various sample size. In this way, we could compare the performance of the testing algorithms based on the same number of samples as input. Each result we report here are computed based on 1000 repetitions. The central ranking of each model which the random samples are generated from, is selected uniformly at random in each each run independently.