On Distributed Averaging for Stochastic k-PCA
Authors: Aditya Bhaskara, Pruthuvi Maheshakya Wijewardena
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we run experiments on both real and synthetic datasets (the latter gives us a way to control the eigenvalue gaps), and establish that our theoretical bounds are reflected accurately in practice. 5 Experiments We validate our results with experiments using synthetic and real datasets. We simulated a distributed environment on a single machine. |
| Researcher Affiliation | Academia | Aditya Bhaskara School of Computing University of Utah bhaskara@cs.utah.edu Maheshakya Wijewardena School of Computing University of Utah pmaheshakya4@gmail.com |
| Pseudocode | Yes | Algorithm 1 Distributed Averaging (parameter k) Local: On each machine, compute the rank-k SVD of the empirical covariance matrix A(j), and send V (j) k (as defined above) to the server. Server: On the central server, compute Ak = 1 m m j=1 ˆV (j)( ˆV (j))T . Then output the top k eigenvalues and the corresponding eigenvectors of Ak. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We used 3 real datasets to evaluate our methods (Table 1)[ [13, 17, 6]]. Each dataset has N points and d features. Dataset N d r t MNIST-small 20000 196 5 15 NIPS-papers 11463 150 5 15 FMA-music 21314 518 10 70 Table 1: Dataset information |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits using percentages or counts. It mentions 'samples from an unknown distribution' and 'n i.i.d. samples' per machine. |
| Hardware Specification | No | The paper mentions simulating a distributed environment on 'a single machine' but provides no specific details about the hardware (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | No | The paper describes the setup for synthetic data generation and real data sampling, as well as the number of machines (m=50) and iterations (200). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer settings) which are typically part of a detailed experimental setup. |