reproducibilityindex.ai

Creating a Public Repository for Joining Private Data

Authors: James Cook, Milind Shyani, Nina Mishra

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Experiments The goal of the experiments is to evaluate the accuracy of our solutions and quantify the effect of noise and join size on downstream tasks.
Researcher Affiliation	Industry	James Cook Independent Researcher falsifian@falsifian.org Milind Shyani Amazon mshyani@amazon.com Nina Mishra Amazon nmishra@amazon.com Work done while employed by Amazon.
Pseudocode	Yes	Algorithm 1 Sender [...] Algorithm 2 Receiver Linear Queries [...] Algorithm 3 Receiver Optimize
Open Source Code	No	The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Datasets. Two datasets are used in the experiments. (1) UC Irvine Adult dataset [1]. We predict if income is greater than 50K based on categorical features such as age, workclass and education. (2) Extended MNIST, for which we predict labels based on images of handwritten digits. [1] Adult. Adult. UCI Machine Learning Repository, 1996.
Dataset Splits	No	The paper specifies training and test set sizes (e.g., 'training set of 32,562 rows and a test set of 16,283 rows' for UCI Adult, and '240K (40K) images of digits to train (test)' for EMNIST) but does not explicitly provide details for a separate validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only describes training procedures.
Software Dependencies	No	The paper mentions software components like 'Py Torch', 'Adam optimizer', and 'AdamW' but does not specify their version numbers.
Experiment Setup	Yes	In the non-private setting, we train it using the Adam optimizer and a learning rate of 10 4, batch size of 128 and obtain a test accuracy of 98.04%. In the presence of privacy, we conduct experiments to quantify the change in accuracy as we vary d, ε and \|DR\|/\|DR DS\|. We use Adam W with a learning rate of 1 10 5 and a batch size of 2048 for 200 epochs. For the experiment involving different k in Fig. 4, we use the EMNIST bymerge dataset of 760K images. We train a Res Net of 11M parameters, using Adam W with a batch size of 1024 and learning rate of 1 10 6 for 25 epochs.