On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations

Authors: Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its accuracy on both simulated and real-world datasets.
Researcher Affiliation Academia 1Department of Computer Science. 2Department of Statistics and Data Sciences. 3Department of Information, Risk, and Operations Management. The University of Texas at Austin, TX, USA.
Pseudocode Yes Algorithm 1 Geo NMF Input: Adjacency matrix A; number of communities K; a constant ϵ0 Output: Estimated node-community distribution matrix ˆΘ, Community-community interaction matrix ˆB, sparsity-control parameter ˆρ; ... Algorithm 2 Partition Pure Nodes Input: Matrix M Rm K, where each row represents a pure node; a constant τ Output: A set S consisting of one pure node from each cluster.
Open Source Code No The paper mentions that co-authorship datasets are available at a URL, but it does not provide an explicit statement or link for the source code of the Geo NMF methodology itself.
Open Datasets Yes For the DBLP and Microsoft Academic networks we construct a row of Θ by normalizing the number of papers an author has in different conferences (ground truth communities). We preprocessed the networks by recursively removing isolated nodes, communities without any pure nodes, and nodes with no community assignments. ... For the co-authorship networks, all communities have enough pure nodes, and after removing isolated nodes, the networks have more than 200 nodes and n/K is larger than 100. ... Available at http://www.cs.utexas.edu/~xmao/coauthorship
Dataset Splits No The paper does not provide specific details on how the datasets were split into training, validation, and test sets (e.g., percentages, sample counts, or specific split files), only that it uses simulated and real-world datasets for evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. It only mentions a 'GPU' in a footnote regarding another algorithm's implementation.
Software Dependencies No The paper mentions using MCMC techniques, variational methods (SVI), BSNMF, OCCAM, and SAAC as baseline methods, but it does not specify any software versions for these or for its own Geo NMF implementation.
Experiment Setup Yes Unless otherwise stated, we set n = 5000, K = 3, and α0 = 1. ... We choose ϵ0 = OP (ϵ ) and it is straightforward to show by Lemmas 4.1, 4.3, and Theorem 4.2 that if ϵ0 2ϵ , then F includes all pure nodes from all K communities. ... Let τ = q K 4n mini F D2(i,i) maxi F D2(i,i) ... we construct the candidate pure node set F (step 5 of Algorithm 1) by finding all nodes with norm within ϵ0 multiplicative error of the largest norm. We increase ϵ0 from a small value, until ˆXp has condition number close to one.