Tight Data Access Bounds for Private Top-$k$ Selection
Authors: Hao Wu, Olga Ohrimenko, Anthony Wirth
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the top-k selection problem under the differential privacy model: m items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only O(mk) expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Our analysis also shows that the well-known exponential mechanism requires only O(m) expected accesses. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted accesses; a sequence of accesses of either kind. We show that, to avoid Ω(m) access cost, supporting both kinds of access is necessary, and that in this case our algorithm s access cost is optimal. |
| Researcher Affiliation | Academia | 1School of Computing and Information Systems, The University of Melbourne. Correspondence to: Hao Wu <whw4@student.unimelb.edu.au>, Olga Ohrimenko <oohrimenko@unimelb.edu.au>, Anthony Wirth <awirth@unimelb.edu.au>. |
| Pseudocode | Yes | Algorithm 1 Threshold Algorithm ATA (Fagin et al., 2003); Algorithm 2 Private Top-k Algorithm M; Algorithm 3 Private Threshold Algorithm APriv TA; Algorithm 4 Algorithm Aoracle. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not use real-world or simulated datasets that are made publicly available. It defines abstract data models (e.g., 'Let C .= {1, . . . , m} be a set of m items, and U .= {1, . . . , n} be a set of n clients.'). |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |