Dual Query: Practical Private Query Release for High Dimensional Data
Authors: Marco Gaboardi, Emilio Jesus Gallego Arias, Justin Hsu, Aaron Roth, Zhiwei Steven Wu
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Dual Query on a large collection of 3-way marginal queries on several real datasets (Figure 1) and high dimensional synthetic data. Adult and KDD99 are from the UCI repository (Bache & Lichman, 2013), and have a mixture of discrete (but nonbinary) and continuous attributes, which we discretize into binary attributes. We also use the (in)famous Netflix movie ratings dataset, with more than 17,000 binary attributes. We report maximum error in Figure 2, averaged over 5 runs. |
| Researcher Affiliation | Academia | Marco Gaboardi M.GABOARDI@DUNDEE.AC.UK University of Dundee, Dundee, Scotland, UK Emilio Jes us Gallego Arias EMILIOGA@CIS.UPENN.EDU Justin Hsu JUSTHSU@CIS.UPENN.EDU Aaron Roth AAROTH@CIS.UPENN.EDU Zhiwei Steven Wu WUZHIWEI@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, USA |
| Pseudocode | Yes | Algorithm 1 Dual Query |
| Open Source Code | No | The paper does not provide any concrete access information, such as a link to a repository or an explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | Adult and KDD99 are from the UCI repository (Bache & Lichman, 2013), and have a mixture of discrete (but nonbinary) and continuous attributes, which we discretize into binary attributes. We also use the (in)famous Netflix movie ratings dataset, with more than 17,000 binary attributes. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions evaluating on a certain number of marginals but not how the main dataset itself was split for training, validation, or testing. |
| Hardware Specification | Yes | We ran the experiments on a mid-range desktop machine with a 4-core Intel Xeon processor and 12 Gb of RAM. |
| Software Dependencies | No | The paper mentions 'The implementation is written in OCaml, using the CPLEX constraint solver.' but does not provide specific version numbers for OCaml or CPLEX. |
| Experiment Setup | Yes | Rather than set the parameters as in Algorithm 1, we experiment with a range of parameters. For instance, we frequently run for fewer rounds (lower T) and take fewer samples (lower s). Heuristically, we set a timeout for each CPLEX call to 20 seconds, accepting the best current solution if we hit the timeout. |