Creating a Public Repository for Joining Private Data
Authors: James Cook, Milind Shyani, Nina Mishra
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 7 Experiments The goal of the experiments is to evaluate the accuracy of our solutions and quantify the effect of noise and join size on downstream tasks. |
| Researcher Affiliation | Industry | James Cook Independent Researcher falsifian@falsifian.org Milind Shyani Amazon mshyani@amazon.com Nina Mishra Amazon nmishra@amazon.com Work done while employed by Amazon. |
| Pseudocode | Yes | Algorithm 1 Sender [...] Algorithm 2 Receiver Linear Queries [...] Algorithm 3 Receiver Optimize |
| Open Source Code | No | The paper does not include an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Datasets. Two datasets are used in the experiments. (1) UC Irvine Adult dataset [1]. We predict if income is greater than 50K based on categorical features such as age, workclass and education. (2) Extended MNIST, for which we predict labels based on images of handwritten digits. [1] Adult. Adult. UCI Machine Learning Repository, 1996. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., 'training set of 32,562 rows and a test set of 16,283 rows' for UCI Adult, and '240K (40K) images of digits to train (test)' for EMNIST) but does not explicitly provide details for a separate validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only describes training procedures. |
| Software Dependencies | No | The paper mentions software components like 'Py Torch', 'Adam optimizer', and 'AdamW' but does not specify their version numbers. |
| Experiment Setup | Yes | In the non-private setting, we train it using the Adam optimizer and a learning rate of 10 4, batch size of 128 and obtain a test accuracy of 98.04%. In the presence of privacy, we conduct experiments to quantify the change in accuracy as we vary d, ε and |DR|/|DR DS|. We use Adam W with a learning rate of 1 10 5 and a batch size of 2048 for 200 epochs. For the experiment involving different k in Fig. 4, we use the EMNIST bymerge dataset of 760K images. We train a Res Net of 11M parameters, using Adam W with a batch size of 1024 and learning rate of 1 10 6 for 25 epochs. |