Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tight Data Access Bounds for Private Top-$k$ Selection
Authors: Hao Wu, Olga Ohrimenko, Anthony Wirth
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the top-k selection problem under the differential privacy model: m items are rated according to votes of a set of clients. We consider a setting in which algorithms can retrieve data via a sequence of accesses, each either a random access or a sorted access; the goal is to minimize the total number of data accesses. Our algorithm requires only O(mk) expected accesses: to our knowledge, this is the first sublinear data-access upper bound for this problem. Our analysis also shows that the well-known exponential mechanism requires only O(m) expected accesses. Accompanying this, we develop the first lower bounds for the problem, in three settings: only random accesses; only sorted accesses; a sequence of accesses of either kind. We show that, to avoid Ω(m) access cost, supporting both kinds of access is necessary, and that in this case our algorithm s access cost is optimal. |
| Researcher Affiliation | Academia | 1School of Computing and Information Systems, The University of Melbourne. Correspondence to: Hao Wu <EMAIL>, Olga Ohrimenko <EMAIL>, Anthony Wirth <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Threshold Algorithm ATA (Fagin et al., 2003); Algorithm 2 Private Top-k Algorithm M; Algorithm 3 Private Threshold Algorithm APriv TA; Algorithm 4 Algorithm Aoracle. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not use real-world or simulated datasets that are made publicly available. It defines abstract data models (e.g., 'Let C .= {1, . . . , m} be a set of m items, and U .= {1, . . . , n} be a set of n clients.'). |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies with version numbers required for reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training settings. |