reproducibilityindex.ai

Partitioned Tensor Factorizations for Learning Mixed Membership Models

Authors: Zilong Tan, Sayan Mukherjee

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach obtains competitive empirical results on both simulated and real data. ... 6. Results on real and simulated data
Researcher Affiliation	Academia	Zilong Tan 1 Sayan Mukherjee 1 ... 1Duke University, Durham, NC.
Pseudocode	Yes	Algorithm 1 Factorize (M, k, d)
Open Source Code	Yes	The code to reproduce the experiments is available at: https://goo.gl/3DBXIo.
Open Datasets	Yes	We adapt a simulation study from (Zhao et al., 2016)... We compare the predictive performance on ﬁve data sets of several tensor decomposition methods as well as the EM algorithm initialized with majority voting by the workers (MV+EM). The task is to predict the true label given incomplete and noisy observations from a set of workers, this is a mixed membership problem (Dawid & Skene, 1979). In (Zhang et al., 2014) a third-order tensor estimator was proposed to obtain an initial estimate for the EM algorithm.
Dataset Splits	No	The paper discusses using simulated and real-world datasets and varying sample sizes, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages, counts, or specific split methodologies) for its experiments.
Hardware Specification	Yes	On a laptop with Intel i7-4702HQ@2.20GHz CPU and 8GB memory
Software Dependencies	No	The paper mentions using 'online code provided by the corresponding authors' and refers to existing methods like 'hals (Kim et al., 2014)' and 'meld (Zhao et al., 2016)', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	We adapt a simulation study from (Zhao et al., 2016)... We consider a GDLM where each variable takes categorical values {0, 1, 2, 3} and the parameters of the Dirichlet mixing distribution are {αj = 0.1}k j=1. We initially consider 25 variables... We vary the number of components k and add noise by replacing a fraction δ of the observations with draws from a discrete uniform distribution. We also vary the number of samples n = 100, 500, 1000, 5000, number of clusters k = 3, 5, 10, 20, and contamination δ = 0, 0.05, 0.1.