Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distributed Machine Learning with Sparse Heterogeneous Data
Authors: Dominic Richards, Sahand Negahban, Patrick Rebeschini
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Statistical Performance Figure 1 plots the probability of recovery against the number of samples held by non-root nodes Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Observe, for a path topology and balanced tree topology, once the non-root nodes have beyond approximately 30 samples, the solution to TVBP finds the correct support for all of graph sizes. In contrast, the number of samples required to recover a signal with Basis Pursuit at the same level of sparsity and dimension considered would require at least 80 samples, i.e. 2s log(ed/s). We therefore save approximately 50 for each non-root problem. |
| Researcher Affiliation | Academia | Dominic Richards Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB EMAIL Sahand N Negahban Department of Statistics and Data Science Yale University 24 Hillhouse Ave., New Haven, CT 06510 EMAIL Patrick Rebeschini Department of Statistics University of Oxford 24-29 St Giles , Oxford, OX1 3LB EMAIL |
| Pseudocode | No | The paper describes algorithm steps in text within Appendix A.2 'ADMM for TVBP' but does not present them in a formally structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Hyperspectral Unmixing. We apply Total Variation Basis Pursuit Denoising to the popular AVIRIS Cuprite mine reflectance dataset https://aviris.jpl.nasa.gov/data/free_data.html with a subset of the USGS library splib07 [26]. |
| Dataset Splits | No | The paper describes the number of samples Nv for agents and the problem parameters (d, s, s) but does not provide explicit train/test/validation dataset split percentages, counts, or a specific splitting methodology. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions software packages like 'SPGL1 Python package', 'CVXOPT', and 'SUNn SAL' but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | Statistical Performance Figure 1 plots the probability of recovery against the number of agents samples Nv for v V{1} with a fixed number of root agent samples N1 = 2s log(ed/s) . Problem setting d = 128, s = 12, s = 4 and N1 = 2s log(ed/s) = 80, for path (Left) and balance tree with branches of size 2 (Right). Lines indicates graph sizes with n {2, 4, 8, 16} for path and n {7, 15, 31} for balanced tree with heights of {2, 3, 4} respectively. Solution to reformulated problem (11) found using CVXOPT. Each point is an average of 20 replications. Signal sampled from {1, 1}, differences concatenation of s values. {Av}v V standard Gaussian and e G = G. |