Decentralized Matrix Sensing: Statistical Guarantees and Fast Convergence
Authors: Marie Maros, Gesualdo Scutari
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All simulations are performed on a Apple M2 Pro @ 3.5 GHz computer, using 32 GB RAM running mac OS Ventura 13.3.1. We generate a random matrix X 2 Rd r?, with d = 50 and r? = 2, which we use through all experiments. The symmetric measurement matrices are generated as Ai = (1/2)(Si + S>i ), where Si 2 Rd d have i.i.d. standard Gaussian elements. The communication networks are generated as Erd os-Rényi graphs, with link activation probability p = 0.05. and different sizes m (specified in each experiment below). For any generated graph, we set W according to Metropolis weights [28], and then let W = W K, with the integer K chosen to meet the condition (12) on , resulting in K communication rounds per agent/iteration. Finally, we choose = 1/4 and µ = d 3. (i) Validating Theorem 1: This experiment shows that under the conditions of Theorem 1, the test error behaves predictably as that of the centralized GD (up to constant factors). Furthermore, the invariance of such an error with the network size m is also confirmed, as long as O(1/m6) (as requested in (12)). In the experiments, the total sample size is N = 1000, split equally among agents m = {5, 100, 500, 1000}. Fig 1a plots the normalized test error; Fig 1b shows k(V ?M)>VU tk which measures the missalignment of U t with the power method matrix; and and Fig 1c displays the σr?(U t)/σr?(X) which combined with Fig 1b allows us to claim that the signal U t Qt is well aligned with VZ? and full ranked. The curves show that the behavior of the decentralized algorithm is close to that of the centralized GD (blue lines). As predicted, the error decays quickly after the correct subspace has been identified. Furthermore, convergence rate and generalization error are almost invariant to network-size scaling, as long as O(m 6). (ii) Validating condition (12) on : We showcase the necessity of decreasing while increasing m. Given a connected base graph with associated W, for the sequence of graphs generated with increasing m = {10, 50, 100, 500}, we let W = W T , with T such that 0.85, for all m. This eventually violates (12). Fig. 2 (resp. Fig. 2b ) plots the normalized generalization error versus the iterations, for m = 1, m = 10 and m = 50 (resp. m = 100 and m = 500, where the two curves are only up to the iterations t = 70 and t = 17, respectively). The figures demonstrate the necessity of scaling down with m increasing. In fact, both the rate and achievable estimation errors degrade (and eventually break down) as the network size increases while keeping fixed. We claim that this stems from the fact that if the network is not sufficiently well-connected, the in-network RIP does not hold with sufficiently small tolerance, yielding to a failure of the power method early stage and consequently producing an unrecoverable missalignment with the signal subspace. |
| Researcher Affiliation | Academia | Marie Maros School of Industrial Engineering Purdue University mmaros@purdue.edu Gesualdo Scutari School of Industrial Engineering Purdue University gscutari@purdue.edu |
| Pseudocode | No | The paper presents the algorithm (3) using mathematical notation but does not include a clearly labeled “Pseudocode” or “Algorithm” block with structured steps. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes generating its own synthetic data for experiments (“generate a random matrix X”, “symmetric measurement matrices are generated as Ai = (1/2)(Si + S>i ), where Si 2 Rd d have i.i.d. standard Gaussian elements”) and does not provide access information for a public dataset. |
| Dataset Splits | No | The paper mentions splitting the total sample size equally among agents but does not specify explicit train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | Yes | All simulations are performed on a Apple M2 Pro @ 3.5 GHz computer, using 32 GB RAM running mac OS Ventura 13.3.1. |
| Software Dependencies | No | The paper mentions the operating system (“mac OS Ventura 13.3.1”) but does not list specific software libraries or dependencies with version numbers (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We generate a random matrix X 2 Rd r?, with d = 50 and r? = 2, which we use through all experiments. The symmetric measurement matrices are generated as Ai = (1/2)(Si + S>i ), where Si 2 Rd d have i.i.d. standard Gaussian elements. The communication networks are generated as Erd os-Rényi graphs, with link activation probability p = 0.05. and different sizes m (specified in each experiment below). For any generated graph, we set W according to Metropolis weights [28], and then let W = W K, with the integer K chosen to meet the condition (12) on , resulting in K communication rounds per agent/iteration. Finally, we choose = 1/4 and µ = d 3. |