Partitioned Tensor Factorizations for Learning Mixed Membership Models

Authors: Zilong Tan, Sayan Mukherjee

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach obtains competitive empirical results on both simulated and real data. ... 6. Results on real and simulated data
Researcher Affiliation Academia Zilong Tan 1 Sayan Mukherjee 1 ... 1Duke University, Durham, NC.
Pseudocode Yes Algorithm 1 Factorize (M, k, d)
Open Source Code Yes The code to reproduce the experiments is available at: https://goo.gl/3DBXIo.
Open Datasets Yes We adapt a simulation study from (Zhao et al., 2016)... We compare the predictive performance on five data sets of several tensor decomposition methods as well as the EM algorithm initialized with majority voting by the workers (MV+EM). The task is to predict the true label given incomplete and noisy observations from a set of workers, this is a mixed membership problem (Dawid & Skene, 1979). In (Zhang et al., 2014) a third-order tensor estimator was proposed to obtain an initial estimate for the EM algorithm.
Dataset Splits No The paper discusses using simulated and real-world datasets and varying sample sizes, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages, counts, or specific split methodologies) for its experiments.
Hardware Specification Yes On a laptop with Intel i7-4702HQ@2.20GHz CPU and 8GB memory
Software Dependencies No The paper mentions using 'online code provided by the corresponding authors' and refers to existing methods like 'hals (Kim et al., 2014)' and 'meld (Zhao et al., 2016)', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We adapt a simulation study from (Zhao et al., 2016)... We consider a GDLM where each variable takes categorical values {0, 1, 2, 3} and the parameters of the Dirichlet mixing distribution are {αj = 0.1}k j=1. We initially consider 25 variables... We vary the number of components k and add noise by replacing a fraction δ of the observations with draws from a discrete uniform distribution. We also vary the number of samples n = 100, 500, 1000, 5000, number of clusters k = 3, 5, 10, 20, and contamination δ = 0, 0.05, 0.1.