reproducibilityindex.ai

DinTucker: Scaling Up Gaussian Process Models on Large Multidimensional Arrays

Authors: Shandian Zhe, Yuan Qi, Youngja Park, Zenglin Xu, Ian Molloy, Suresh Chari

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that DINTUCKER maintains the predictive accuracy of Inf Tucker and is scalable on massive data: On multidimensional arrays with billions of elements from two real-world applications, DINTUCKER achieves significantly higher prediction accuracy with less training time, compared with the state-of-the-art large-scale tensor decomposition method, Giga Tensor.
Researcher Affiliation	Collaboration	Shandian Zhe,1 Yuan Qi,1 Youngja Park,2 Zenglin Xu,3 Ian Molloy,2 and Suresh Chari2 1Department of Computer Science, Purdue University 2IBM Thomas J. Watson Research Center 3School of Comp. Sci. & Eng., Big Data Res. Center, University of Electronic Science and Technology of China
Pseudocode	No	The paper describes algorithmic steps in prose but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'We implemented DINTUCKER with PYTHON and used HADOOP streaming for training and prediction.' but does not explicitly provide a link or statement of open-source code availability for the methodology described.
Open Datasets	Yes	The first dataset is NELL, a knowledge base containing triples (e.g., George Harrison , plays Instrument , Guitar ) from the Read the Web project (Carlson et al. 2010).
Dataset Splits	Yes	Specifically, we split the nonzero entries into 5 folds and used 4 folds for training. For the test set, we used all the ones in the remaining fold and randomly chose 0.1% zero entries (so that the evaluations will not be overwhelmed by zero elements). We repeated this procedure for 5 times with different training and test sets each time. ... The NELL and ACC datasets contain 0.0001% and 0.003% nonzero entries, respectively. We randomly chose 80% of nonzero entries for training and then, from the remaining entries, we sampled 50 test datasets, each of which consists of 200 nonzero entries and 2, 000 zero entries.
Hardware Specification	Yes	We carried out our experiments on a HADOOP cluster. The cluster consists of 16 machines, each of which has a 4-quad Intel Xeon-E3 3.3 GHz CPU, 8 GB RAM, and a 4 Terabyes disk.
Software Dependencies	No	The paper states 'We implemented DINTUCKER with PYTHON and used HADOOP streaming for training and prediction.' but does not provide specific version numbers for Python, Hadoop, or any other software dependencies.
Experiment Setup	Yes	For DINTUCKER, we set the subarray size to 40 40 40 for Digg1 and Enron, and 20 20 20 20 for Digg2. ... For each strategy, we sampled 1, 500 subarrays for training. We ran our distributed online inference algorithm with 3 mappers, and set the number of iterations to 5. We tuned the learning rate η from the range {0.0005, 0.001, 0.002, 0.005, 0.01}. We used another cross-validation to choose the kernel function from the RBF, linear, Polynomial and Mat ern functions and tuned its hyperparameters.