Re-Analyze Gauss: Bounds for Private Matrix Approximation via Dyson Brownian Motion

Authors: Oren Mangoubi, Nisheeth Vishnoi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our bounds are obtained by viewing the addition of Gaussian noise as a continuous-time matrix Brownian motion. This viewpoint allows us to track the evolution of eigenvalues and eigenvectors of the matrix, which are governed by stochastic differential equations discovered by Dyson. These equations allow us to bound the utility as the square-root of a sum-of-squares of perturbations to the eigenvectors, as opposed to a sum of perturbation bounds obtained via Davis-Kahan-type theorems. Empirically, we observe that Assumption 2.1 is satisfied for covariance matrices of many real-world datasets (see Appendix J), as well as on Wishart random matrices W = A A, where A is an m d matrix of i.i.d. Gaussian entries, for sufficiently large m (see Appendix I).
Researcher Affiliation Academia Oren Mangoubi Worcester Polytechnic Institute Nisheeth K. Vishnoi Yale University
Pseudocode No The paper describes mathematical derivations and theoretical concepts, including stochastic differential equations and lemmas, but it does not present any pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (See Appendix I)
Open Datasets Yes We observe empirically that Assumption 2.1 is satisfied on a number of real-world datasets which were previously used as benchmarks in the differentially private matrix approximation literature [11, 2] (see Appendix J). Assumption 2.1 is also satisfied, for instance, by random Wishart matrices W = A A, where A is an m d matrix of i.i.d. Gaussian entries, which are a popular model for sample covariance matrices [47].
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages or sample counts. While it mentions empirical verification and Appendix I, the main text does not detail the data partitioning strategy for model training.
Hardware Specification No The paper does not provide specific details about the hardware used to run its experiments, such as CPU/GPU models, memory, or cloud instance types. While the checklist indicates Appendix I, that appendix does not contain specific hardware specifications either.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. It does not mention Python, PyTorch, or any other programming environment with version details.
Experiment Setup No The paper does not contain specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size), optimizer settings, or other concrete training configurations. While it mentions Appendix I for empirical verification, it does not provide these details in the main text.