Efficient Nonparametric Tensor Decomposition for Binary and Count Data
Authors: Zerui Tao, Toshihisa Tanaka, Qibin Zhao
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on several real-world tensor completion tasks, considering binary and count datasets. The results manifest both better performance and computational advantages of the proposed model. |
| Researcher Affiliation | Academia | 1Tokyo University of Agriculture and Technology, Japan 2RIKEN Center for Advanced Intelligence Project (AIP), Japan |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It describes the model and inference steps in prose. |
| Open Source Code | Yes | The code is mainly based on Py Torch (Paszke et al. 2019) and available at https://github.com/taozerui/gptd |
| Open Datasets | Yes | Datasets We test our model on three binary tensor datasets: (1) Digg (Xu, Yan, and Qi 2012)... (2) Enron (Xu, Yan, and Qi 2012)... (3) DBLP (Zhe et al. 2016)... We evaluate the proposed model on three count tensors. (1) JHU (Dong, Du, and Gardner 2020)... (2) Article (Zhe and Du 2018)... (3) EMS (Zhe and Du 2018)... |
| Dataset Splits | Yes | For Digg and Enron, we randomly sample an equal number of zero entries to obtain a balanced dataset. For DBLP, the same train/test split with Zhe et al. (2016) is adopted. For binary datasets, we evaluate the area under the ROC curve (AUC) and the negative log-likelihood (NLL) of estimated Bernoulli distributions. We report the mean and standard deviation of 5-fold cross-validation. The data is fully observed and we use 20% observations to predict the rest entries. |
| Hardware Specification | Yes | All experiments are conducted on a workstation with an Intel Xeon Silver 4316 CPU@2.30GHz, 512GB RAM and NVIDIA RTX A6000 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al. 2019)" but does not specify a version number for PyTorch or other software dependencies. |
| Experiment Setup | Yes | All stochastic methods are optimized using batch size 128. Moreover, gradient-based models are optimized using Adam with a learning rate chosen from {3 10 3, 1 10 3, 3 10 4, 1 10 4}, except GCP, whose default optimizer is L-BFGS. We test all methods with different tensor ranks ranging from { 3, 5, 10 }. For GP-based methods, we use 100 inducing points and RBF kernel with bandwidth 1.0, consistent with previous work (Zhe et al. 2016; Zhe and Du 2018). Note that, for ENTED, the inducing points number is 50 + 50 for u and v, respectively. |