Provable Non-linear Inductive Matrix Completion

Authors: Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we apply our model on synthetic datasets and verify our theoretical analysis. In this section, we generate some synthetic datasets to verify the sample complexity and the convergence of gradient descent. For simplicity, we apply gradient descent with initialization as W (0) = (1 )W + W (r), where W is the ground truth, W (r) is a Gaussian random matrix, and 2 [0, 1]. In Fig. 1 (a)(b), = 0.1 and in Fig. 1 (c)(d) = 1. The sampling rule for the observations follows our previous assumptions. For each n, m pair, we make 5 trials and take the average of the successful recovery times. We say a solution (U, V ) successfully recovers the ground truth parameters when the solution achieves 0.001 relative testing error, i.e., kφ(Xt U)φ(Xt U)> φ(Xt U )φ(Xt U )>k F 0.001 kφ(Xt U )φ(Xt U )>k F , where Xt 2 Rn d is a newly sampled testing dataset. For both Re LU and sigmoid, we minimize the original objective function (3).
Researcher Affiliation Collaboration Amazon kaizhong@amazon.com Zhao Song University of Washington magic.linuxkde@gmail.com Prateek Jain Microsoft prajain@microsoft.com Inderjit S. Dhillon Amazon & University of Texas at Austin isd@amazon.com
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Table 1: Test RMSE for recommending new users with existing movies on Movielens dataset. We use users demographic information and movies genre information as features x and y respectively. We randomly split the users into existing users (training data) and new users (testing data) with ratio 4:1. Hence, we are predicting ratings for completely new users, for which only users demographic features are available. The user features include 21 types of occupations, 7 different age ranges and one gender information; the movie features include 18-19 (18 for ml-1m and 19 for ml-100k) genre features and 20 features from the top 20 right singular values of the training rating matrix (which has size #training users -by#movies). k is set to 50, and φ is the Re LU activation.
Dataset Splits Yes We randomly split the users into existing users (training data) and new users (testing data) with ratio 4:1.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes For simplicity, we apply gradient descent with initialization as W (0) = (1 )W + W (r), where W is the ground truth, W (r) is a Gaussian random matrix, and 2 [0, 1]. In Fig. 1 (a)(b), = 0.1 and in Fig. 1 (c)(d) = 1. The sampling rule for the observations follows our previous assumptions. For each n, m pair, we make 5 trials and take the average of the successful recovery times. We say a solution (U, V ) successfully recovers the ground truth parameters when the solution achieves 0.001 relative testing error, i.e., kφ(Xt U)φ(Xt U)> φ(Xt U )φ(Xt U )>k F 0.001 kφ(Xt U )φ(Xt U )>k F , where Xt 2 Rn d is a newly sampled testing dataset. For both Re LU and sigmoid, we minimize the original objective function (3). In (a), k = 5, d = 10, n = 1000, and m = 10000. In (b), when k, d are large (k = 10, d = 100, n = 500). In Figure 1 (c)(d). For sigmoid, set the number of samples n1 = n2 = n = {10 i}i=1,2 ,10 and the number of observations | | = m = {2kd i}i=1,2, ,10. For Re LU, set n = {20 i}i=1,2 ,10 and m = {4kd i}i=1,2, ,10.