Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Fair Value of Data Under Heterogeneous Privacy Constraints in Federated Learning

Authors: Justin Singh Kang, Ramtin Pedarsani, Kannan Ramchandran

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We numerically investigate the mechanism design problem under this example and see how heterogeneity impacts the optimal behavior of the platform. Finally, Section 5 explores the platform mechanism design problem. In Theorem 3 we establish that there are three distinct regimes in which the platform s optimal behavior differs depending on the common privacy sensitivity of the users. [...] Fig 7 shows the numerical solution to equation 19 for two different choices of s2 and r2.
Researcher Affiliation Academia Justin S. Kang EMAIL UC Berkeley Ramtin Pedarsani EMAIL UC Santa Barbara Kannan Ramchandran EMAIL UC Berkeley
Pseudocode Yes Algorithm 1: Find optimal α input : ni, ci : i = 1 . . . , N output: α n Array(i) ni, i = 1 . . . , N; c Array(i) ci, i = 1 . . . , N; partitions Get Valid Partitions(ni : i = 1 . . . , N) ; /* all ρ that produce unique U */ for i = 1 to len(partitions) do rho partitions(i) ; /* one representative ρ from the partition */ for j = 1 to N do phi(j) Shapley(ρ, j, n Array) ; /* Actual code skips repeated calculations */ end for α grid do ne Exists Tree Search(α phi, c Array) ; /* Check if any ρ in the partition is NE */ if ne Exists then curr Util (1 α) Utility (ρ, n Array); if curr Util > max Util then α α ; /* Update α if needed */ max Util curr Util; end end end end
Open Source Code No The paper does not contain an explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No In this example, we use DP as our heterogeneous privacy framework. Let Xi represent independent and identically distributed data of user i respectively, with Pr(Xi = 1/2) = p and Pr(Xi = 1/2) = 1 p, with p Unif(0, 1). The platform s goal is to construct an ϵ-DP estimator for µ := E[Xi] = p 1/2 that minimizes Bayes risk. [...] There is no mention of a specific, publicly available dataset used in the experiments. The data is described as synthetically generated for an example scenario.
Dataset Splits No The paper describes a theoretical framework and uses simulated data for its examples and numerical solutions. It does not mention or define any specific training, validation, or test splits, as would be common in empirical machine learning experiments on a dataset.
Hardware Specification No The paper focuses on theoretical analysis and numerical simulations. There is no mention of any specific hardware (e.g., GPU/CPU models, cloud instances) used for running these simulations or computations.
Software Dependencies No The paper describes mathematical models, theorems, and algorithms, including a numerical solution approach. However, it does not specify any particular software libraries, frameworks, or their version numbers that were used for implementation or numerical computation.
Experiment Setup No The paper describes problem formulations, utility functions, and parameters for theoretical analysis (e.g., N=10 users, s2=100, r2=1 for heterogeneity). It also mentions a 'grid search to determine the optimal α' for the mechanism design problem. However, it does not provide details about a typical experimental setup involving machine learning models, such as hyperparameters (learning rate, batch size, epochs), specific model initialization, or optimizer settings. The 'numerical solution' described is more about finding optimal game-theoretic parameters rather than training a machine learning model.