Hiding Numerical Vectors in Local Private and Shuffled Messages

Authors: Shaowei Wang, Jin Li, Yuqiu Qian, Jiachun Du, Wenqing Lin, Wei Yang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally validate and compare it with existing approaches, and demonstrate its significant error reduction. ... The statistical efficiency of the proposed Collision mechanism for ϵ-LDP numerical vector aggregation is evaluated in this section. Competing mechanisms include the Priv KV mechanism [Ye et al., 2019], the PCKV mechanism with general randomized response as the base randomizer (denoted as PCKV-GRR), and the PCKV mechanism with unary encoding as the base randomizer[Gu et al., 2020] (denoted as PCKV-UE). Since the performances of all these mechanisms are data-independent, it is enough to utilize synthetic datasets for fair evaluation. The parameters of synthetic datasets are listed as follows (default values are in bold form), covering most cases encountered in real-world applications: i. Number of users n: 10,000, 100,000. ii. Dimension d: 256, 1024. iii. Sparsity parameter s: 4, 8, 16, 32. iv. Privacy budget ϵ: 0.001, 0.01, 0.1, 0.2, 0.4, 0.8, 1.0, 1.5, 2.0, 2.5, 3.0. During each simulation, the numerical vector of each user is independent-randomly generated, the non-zero entries are uniform-randomly selected from d dimensions, and each dimension has an equal probability of being 1 or -1.
Researcher Affiliation Collaboration Shaowei Wang1 , Jin Li1 , Yuqiu Qian2 , Jiachun Du2 , Wenqing Lin2 , Wei Yang3 1Institute of Artificial Intelligence and Blockchain, Guangzhou University 2Interactive Entertainment Group, Tencent Inc. 3Department of Computer Science and Technology, University of Science and Technology of China
Pseudocode No The paper describes the Collision mechanism in text (Definition 3) and provides mathematical formulas but does not include a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets No The paper states: "it is enough to utilize synthetic datasets for fair evaluation. The parameters of synthetic datasets are listed as follows (default values are in bold form)... During each simulation, the numerical vector of each user is independent-randomly generated, the non-zero entries are uniform-randomly selected from d dimensions, and each dimension has an equal probability of being 1 or -1." This indicates that the authors generated their own synthetic datasets based on specified parameters, rather than using a publicly available or open dataset.
Dataset Splits No The paper mentions generating synthetic datasets with parameters such as "Number of users n", "Dimension d", "Sparsity parameter s", and "Privacy budget ϵ". However, it does not specify any training, validation, or test dataset splits in terms of percentages, sample counts, or references to predefined splits for these generated datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The parameters of synthetic datasets are listed as follows (default values are in bold form), covering most cases encountered in real-world applications: i. Number of users n: 10,000, 100,000. ii. Dimension d: 256, 1024. iii. Sparsity parameter s: 4, 8, 16, 32. iv. Privacy budget ϵ: 0.001, 0.01, 0.1, 0.2, 0.4, 0.8, 1.0, 1.5, 2.0, 2.5, 3.0. ... The total variation error (TVE) is defined as: j [1,d], b { 1,1} | \ [jb YX] [jb YX]|1, and the maximum absolute error (MAE) is defined as: MAE = max j [1,d], b { 1,1} | \ [jb YX] [jb YX]|1. ... All experimental results are the mean natural logarithm value of 10 repeated simulations.