Distributed Machine Learning with Sparse Heterogeneous Data

Authors: Dominic Richards, Sahand Negahban, Patrick Rebeschini

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Statistical Performance Figure 1 plots the probability of recovery against the number of samples held by non-root nodes Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Observe, for a path topology and balanced tree topology, once the non-root nodes have beyond approximately 30 samples, the solution to TVBP finds the correct support for all of graph sizes. In contrast, the number of samples required to recover a signal with Basis Pursuit at the same level of sparsity and dimension considered would require at least 80 samples, i.e. 2s log(ed/s). We therefore save approximately 50 for each non-root problem.
Researcher Affiliation Academia Dominic Richards Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB Dominic.Richards94@gmail.com Sahand N Negahban Department of Statistics and Data Science Yale University 24 Hillhouse Ave., New Haven, CT 06510 Sahand.Negahban@Yale.edu Patrick Rebeschini Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB Patrick.Rebeschini@stats.ox.ac.uk
Pseudocode No The paper describes algorithm steps in text within Appendix A.2 'ADMM for TVBP' but does not present them in a formally structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement about releasing its source code for the described methodology or a link to a code repository.
Open Datasets Yes Hyperspectral Unmixing. We apply Total Variation Basis Pursuit Denoising to the popular AVIRIS Cuprite mine reflectance dataset https://aviris.jpl.nasa.gov/data/free_data.html with a subset of the USGS library splib07 [26].
Dataset Splits No The paper describes the number of samples Nv for agents and the problem parameters (d, s, s) but does not provide explicit train/test/validation dataset split percentages, counts, or a specific splitting methodology.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies No The paper mentions software packages like 'SPGL1 Python package', 'CVXOPT', and 'SUNn SAL' but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes Statistical Performance Figure 1 plots the probability of recovery against the number of agents samples Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Problem setting d = 128, s = 12, s = 4 and N1 = 2s log(ed/s) = 80, for path (Left) and balance tree with branches of size 2 (Right). Lines indicates graph sizes with n {2, 4, 8, 16} for path and n {7, 15, 31} for balanced tree with heights of {2, 3, 4} respectively. Solution to reformulated problem (11) found using CVXOPT. Each point is an average of 20 replications. Signal sampled from {1, 1}, differences concatenation of s values. {Av}v V standard Gaussian and e G = G.