Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Correlation Aware Sparsified Mean Estimation Using Random Projection
Authors: Shuli Jiang, PRANAY SHARMA, Gauri Joshi
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world distributed optimization tasks showcase the superior performance of Rand-Proj Spatial compared to Rand-k-Spatial and other more sophisticated sparsification techniques. We conduct experiments on common distributed optimization tasks, and demonstrate the superior performance of Rand-Proj-Spatial compared to existing sparsification techniques. |
| Researcher Affiliation | Academia | Shuli Jiang Robotics Institute Carnegie Mellon University EMAIL Pranay Sharma ECE Carnegie Mellon University EMAIL Gauri Joshi ECE Carnegie Mellon University EMAIL |
| Pseudocode | No | The paper describes its methods using mathematical equations and textual descriptions, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | All code used for the experiments can be found at https://github.com/11hifish/Rand-Proj-Spatial. |
| Open Datasets | Yes | For both distributed power iteration and distributed k-means, we use the test set of the Fashion-MNIST dataset [56] consisting of 10000 samples. ... We use the UJIndoor dataset 2 for distributed linear regression. |
| Dataset Splits | No | The paper mentions using 'the test set of the Fashion-MNIST dataset' and that datasets are 'split IID across the clients via random shuffling' or 'non-IID'. However, it does not explicitly provide percentages or counts for training, validation, and test splits for the full datasets to reproduce the data partitioning. |
| Hardware Specification | No | The paper states: 'All experiments are conducted in a cluster of 20 machines, each of which has 40 cores.' However, it does not provide specific details such as CPU model, GPU model, or memory specifications. |
| Software Dependencies | No | The paper states: 'The implementation is in Python, mainly based on numpy and scipy.' However, it does not specify version numbers for Python or the libraries (numpy, scipy) used. |
| Experiment Setup | Yes | For Rand-Proj-Spatial, we use the first 50 iterations to estimate β (see Eq. 5)... We repeat the experiments across 10 independent runs... For both distributed power iterations and distributed k-means, we run the experiments for 30 iterations and consider two different settings: n = 10, k = 102 and n = 50, k = 20. For distributed linear regression, we run the experiments for 50 iterations with learning rate 0.001. |