Distributed Machine Learning with Sparse Heterogeneous Data
Authors: Dominic Richards, Sahand Negahban, Patrick Rebeschini
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Statistical Performance Figure 1 plots the probability of recovery against the number of samples held by non-root nodes Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Observe, for a path topology and balanced tree topology, once the non-root nodes have beyond approximately 30 samples, the solution to TVBP finds the correct support for all of graph sizes. In contrast, the number of samples required to recover a signal with Basis Pursuit at the same level of sparsity and dimension considered would require at least 80 samples, i.e. 2s log(ed/s). We therefore save approximately 50 for each non-root problem. |
| Researcher Affiliation | Academia | Dominic Richards Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB Dominic.Richards94@gmail.com Sahand N Negahban Department of Statistics and Data Science Yale University 24 Hillhouse Ave., New Haven, CT 06510 Sahand.Negahban@Yale.edu Patrick Rebeschini Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB Patrick.Rebeschini@stats.ox.ac.uk |
| Pseudocode | No | The paper describes algorithm steps in text within Appendix A.2 'ADMM for TVBP' but does not present them in a formally structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Hyperspectral Unmixing. We apply Total Variation Basis Pursuit Denoising to the popular AVIRIS Cuprite mine reflectance dataset https://aviris.jpl.nasa.gov/data/free_data.html with a subset of the USGS library splib07 [26]. |
| Dataset Splits | No | The paper describes the number of samples Nv for agents and the problem parameters (d, s, s) but does not provide explicit train/test/validation dataset split percentages, counts, or a specific splitting methodology. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions software packages like 'SPGL1 Python package', 'CVXOPT', and 'SUNn SAL' but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | Statistical Performance Figure 1 plots the probability of recovery against the number of agents samples Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Problem setting d = 128, s = 12, s = 4 and N1 = 2s log(ed/s) = 80, for path (Left) and balance tree with branches of size 2 (Right). Lines indicates graph sizes with n {2, 4, 8, 16} for path and n {7, 15, 31} for balanced tree with heights of {2, 3, 4} respectively. Solution to reformulated problem (11) found using CVXOPT. Each point is an average of 20 replications. Signal sampled from {1, 1}, differences concatenation of s values. {Av}v V standard Gaussian and e G = G. |