Provable Tensor Factorization with Missing Data

Authors: Prateek Jain, Sewoong Oh

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations suggest that the dependence of the sample size on the dimensionality n is indeed tight. Empirical Results: Theorem 1.1 guarantees exact recovery when p Cr5(log n)4/n3/2. Numerical experiments show that the average recovery rate converges to a universal curve over α, where p = αr1/2 ln n/((1 ρ)n3/2) in Figure 1. Our bound is tight in its dependency n up to a poly-logarithmic factor, but is loose in its dependency in the rank r.
Researcher Affiliation Collaboration Prateek Jain Microsoft Research Bangalore, India prajain@microsoft.com Sewoong Oh Dept. of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801 swoh@illinois.edu
Pseudocode Yes Algorithm 1 Alternating Minimization for Tensor Completion 1: Input: PΩ(T), Ω, r, τ, µ 2: Initialize with [(u0 1, σ1), (u0 2, , σ2), . . . , (u0 r, σr)] = RTPM(PΩ(T), r) (RTPM of [1]) 3: [u1, u2, . . . , ur] = Threshold([u0 1, u0 2, . . . , u0 r], µ) (Clipping scheme of [4]) 4: for all t = 1, 2, . . . , τ do 5: /*OUTER LOOP */ 6: for all q = 1, 2, . . . , r do 7: /*INNER LOOP*/ 8: ˆut+1 1 = arg minut+1 q PΩ(T ut+1 q uq uq P ℓ =q σℓ uℓ uℓ uℓ) 2 F 9: σt+1 q = ˆuq t+1 2 10: ut+1 q = ˆut+1 1 / ˆut+1 q 2 11: end for 12: [u1, u2, . . . , ur] [ut+1 1 , ut+1 2 , . . . , ut+1 r ] 13: [σ1, σ2, . . . , σr] [σt+1 1 , σt+1 2 , . . . , σt+1 r ] 14: end for 15: Output: b T = P q [r] σq(uq uq uq)
Open Source Code Yes A MATLAB implementation of Algorithm 1 used to run the experiments is available at http://web.engr.illinois.edu/ swoh/software/optspace .
Open Datasets No The paper describes generating synthetic data for experiments ("We generate orthogonal matrices U = [u1, . . . , ur] Rn r uniformly at random...") but does not provide access information (link, citation, or repository) for a publicly available or open dataset that was used. It only describes the data generation process.
Dataset Splits No The paper does not explicitly provide details about training, validation, and test splits. It mentions random sampling of entries for the tensor generation but not standard dataset splits for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions a MATLAB implementation but does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes how the tensors are generated and the error metric, but it does not specify concrete experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs) or system-level training settings.