On Distributed Averaging for Stochastic k-PCA

Authors: Aditya Bhaskara, Pruthuvi Maheshakya Wijewardena

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we run experiments on both real and synthetic datasets (the latter gives us a way to control the eigenvalue gaps), and establish that our theoretical bounds are reflected accurately in practice. 5 Experiments We validate our results with experiments using synthetic and real datasets. We simulated a distributed environment on a single machine.
Researcher Affiliation Academia Aditya Bhaskara School of Computing University of Utah bhaskara@cs.utah.edu Maheshakya Wijewardena School of Computing University of Utah pmaheshakya4@gmail.com
Pseudocode Yes Algorithm 1 Distributed Averaging (parameter k) Local: On each machine, compute the rank-k SVD of the empirical covariance matrix A(j), and send V (j) k (as defined above) to the server. Server: On the central server, compute Ak = 1 m m j=1 ˆV (j)( ˆV (j))T . Then output the top k eigenvalues and the corresponding eigenvectors of Ak.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We used 3 real datasets to evaluate our methods (Table 1)[ [13, 17, 6]]. Each dataset has N points and d features. Dataset N d r t MNIST-small 20000 196 5 15 NIPS-papers 11463 150 5 15 FMA-music 21314 518 10 70 Table 1: Dataset information
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits using percentages or counts. It mentions 'samples from an unknown distribution' and 'n i.i.d. samples' per machine.
Hardware Specification No The paper mentions simulating a distributed environment on 'a single machine' but provides no specific details about the hardware (e.g., CPU, GPU models, memory).
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup No The paper describes the setup for synthetic data generation and real data sampling, as well as the number of machines (m=50) and iterations (200). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) which are typically part of a detailed experimental setup.