Fast and Guaranteed Tensor Decomposition via Sketching

Authors: Yining Wang, Hsiao-Yu Tung, Alexander J. Smola, Anima Anandkumar

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic and real-world datasets show highly competitive results. We demonstrate a 10x to 100x speed-up over exact methods for decomposing dense, high-dimensional tensors. For topic modeling, we show a significant reduction in computational time over existing spectral LDA implementations with small performance loss. In addition, our proposed algorithm outperforms collapsed Gibbs sampling when running time is constrained. We also show that if a Gibbs sampler is initialized with our output topics, it converges within several iterations and outperforms a randomly initialized Gibbs sampler run for much more iterations.
Researcher Affiliation Academia Yining Wang, Hsiao-Yu Tung, Alex Smola Machine Learning Department Carnegie Mellon University, Pittsburgh, PA 15213 {yiningwa,htung}@cs.cmu.edu alex@smola.org Anima Anandkumar Department of EECS University of California Irvine Irvine, CA 92697 a.anandkumar@uci.edu
Pseudocode Yes Algorithm 1 Fast robust tensor power method
Open Source Code No The paper does not provide any explicit statements about open-sourcing the code for the methodology described, nor does it include links to a code repository.
Open Datasets Yes We compare our proposed fast spectral LDA algorithm with baseline spectral methods and collapsed Gibbs sampling (using Gibbs LDA++ [25] implementation) on two real-world datasets: Wikipedia and Enron.
Dataset Splits No Obtained topic models Φ RV K are evaluated on a held-out dataset consisting of 1000 documents randomly picked out from training datasets. (This describes a held-out test set, but not a separate validation set or specific train/validation/test splits for reproducibility.)
Hardware Specification Yes All methods are implemented in C++ and tested on a single machine with 8 Intel X5550@2.67Ghz CPUs and 32GB memory.
Software Dependencies No All methods are implemented in C++ and tested on a single machine with 8 Intel X5550@2.67Ghz CPUs and 32GB memory. We compare our proposed fast spectral LDA algorithm with baseline spectral methods and collapsed Gibbs sampling (using Gibbs LDA++ [25] implementation)... (No specific version numbers for C++ or Gibbs LDA++).
Experiment Setup Yes For the robust tensor power method the parameters are set to L = 50 and T = 30. For ALS we iterate until convergence, or a maximum number of 1000 iterations is reached. α0 is set to 1.0 and B is set to 30. For collapsed Gibbs sampling, we set α = 50/K and β = 0.1 following [11].