reproducibilityindex.ai

On Distributed Averaging for Stochastic k-PCA

Authors: Aditya Bhaskara, Pruthuvi Maheshakya Wijewardena

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we run experiments on both real and synthetic datasets (the latter gives us a way to control the eigenvalue gaps), and establish that our theoretical bounds are reﬂected accurately in practice. 5 Experiments We validate our results with experiments using synthetic and real datasets. We simulated a distributed environment on a single machine.
Researcher Affiliation	Academia	Aditya Bhaskara School of Computing University of Utah bhaskara@cs.utah.edu Maheshakya Wijewardena School of Computing University of Utah pmaheshakya4@gmail.com
Pseudocode	Yes	Algorithm 1 Distributed Averaging (parameter k) Local: On each machine, compute the rank-k SVD of the empirical covariance matrix A(j), and send V (j) k (as deﬁned above) to the server. Server: On the central server, compute Ak = 1 m m j=1 ˆV (j)( ˆV (j))T . Then output the top k eigenvalues and the corresponding eigenvectors of Ak.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We used 3 real datasets to evaluate our methods (Table 1)[ [13, 17, 6]]. Each dataset has N points and d features. Dataset N d r t MNIST-small 20000 196 5 15 NIPS-papers 11463 150 5 15 FMA-music 21314 518 10 70 Table 1: Dataset information
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits using percentages or counts. It mentions 'samples from an unknown distribution' and 'n i.i.d. samples' per machine.
Hardware Specification	No	The paper mentions simulating a distributed environment on 'a single machine' but provides no specific details about the hardware (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	No	The paper describes the setup for synthetic data generation and real data sampling, as well as the number of machines (m=50) and iterations (200). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) which are typically part of a detailed experimental setup.