Provable Non-linear Inductive Matrix Completion
Authors: Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we apply our model on synthetic datasets and verify our theoretical analysis. In this section, we generate some synthetic datasets to verify the sample complexity and the convergence of gradient descent. For simplicity, we apply gradient descent with initialization as W (0) = (1 )W + W (r), where W is the ground truth, W (r) is a Gaussian random matrix, and 2 [0, 1]. In Fig. 1 (a)(b), = 0.1 and in Fig. 1 (c)(d) = 1. The sampling rule for the observations follows our previous assumptions. For each n, m pair, we make 5 trials and take the average of the successful recovery times. We say a solution (U, V ) successfully recovers the ground truth parameters when the solution achieves 0.001 relative testing error, i.e., kφ(Xt U)φ(Xt U)> φ(Xt U )φ(Xt U )>k F 0.001 kφ(Xt U )φ(Xt U )>k F , where Xt 2 Rn d is a newly sampled testing dataset. For both Re LU and sigmoid, we minimize the original objective function (3). |
| Researcher Affiliation | Collaboration | Amazon kaizhong@amazon.com Zhao Song University of Washington magic.linuxkde@gmail.com Prateek Jain Microsoft prajain@microsoft.com Inderjit S. Dhillon Amazon & University of Texas at Austin isd@amazon.com |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Table 1: Test RMSE for recommending new users with existing movies on Movielens dataset. We use users demographic information and movies genre information as features x and y respectively. We randomly split the users into existing users (training data) and new users (testing data) with ratio 4:1. Hence, we are predicting ratings for completely new users, for which only users demographic features are available. The user features include 21 types of occupations, 7 different age ranges and one gender information; the movie features include 18-19 (18 for ml-1m and 19 for ml-100k) genre features and 20 features from the top 20 right singular values of the training rating matrix (which has size #training users -by#movies). k is set to 50, and φ is the Re LU activation. |
| Dataset Splits | Yes | We randomly split the users into existing users (training data) and new users (testing data) with ratio 4:1. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For simplicity, we apply gradient descent with initialization as W (0) = (1 )W + W (r), where W is the ground truth, W (r) is a Gaussian random matrix, and 2 [0, 1]. In Fig. 1 (a)(b), = 0.1 and in Fig. 1 (c)(d) = 1. The sampling rule for the observations follows our previous assumptions. For each n, m pair, we make 5 trials and take the average of the successful recovery times. We say a solution (U, V ) successfully recovers the ground truth parameters when the solution achieves 0.001 relative testing error, i.e., kφ(Xt U)φ(Xt U)> φ(Xt U )φ(Xt U )>k F 0.001 kφ(Xt U )φ(Xt U )>k F , where Xt 2 Rn d is a newly sampled testing dataset. For both Re LU and sigmoid, we minimize the original objective function (3). In (a), k = 5, d = 10, n = 1000, and m = 10000. In (b), when k, d are large (k = 10, d = 100, n = 500). In Figure 1 (c)(d). For sigmoid, set the number of samples n1 = n2 = n = {10 i}i=1,2 ,10 and the number of observations | | = m = {2kd i}i=1,2, ,10. For Re LU, set n = {20 i}i=1,2 ,10 and m = {4kd i}i=1,2, ,10. |