reproducibilityindex.ai

Streaming Coresets for Symmetric Tensor Factorization

Authors: Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We give empirical results that compare our sampling scheme with other schemes. (Section 1, Introduction); Table 2. Streaming Single Topic Modeling (Section 6, Applications)
Researcher Affiliation	Academia	1Computer Science and Engineering, Indian Institute of Technology Gandhinagar, India.
Pseudocode	Yes	Algorithm 1 Score(xi, M, Minv, Q); Algorithm 2 Line Filter; Algorithm 3 Kernel Filter (Section 4, Algorithms and Guarantees)
Open Source Code	No	The paper does not provide any specific link or explicit statement about making the source code available.
Open Datasets	No	Here we use a subset of 20Newsgroups dataset (preprocessed). We took a subset of 10K documents and considered the 100 most frequent words. We normalized each document vector, such that its ℓ1 norm is 1 and created a matrix A R10K 100. (Section 6, Applications) The paper mentions the 20Newsgroups dataset but does not provide specific access information (link, DOI, or formal citation with authors/year) for their preprocessed subset.
Dataset Splits	No	No explicit details on training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations) are provided. The paper describes processing rows 'one at a time' and evaluating the outcomes, but does not use a typical train/validation/test partitioning.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments are provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., library names with versions) are explicitly mentioned.
Experiment Setup	Yes	Here we use a subset of 20Newsgroups dataset (preprocessed). We took a subset of 10K documents and considered the 100 most frequent words. We normalized each document vector, such that its ℓ1 norm is 1 and created a matrix A R10K 100. We feed its row one at a time to Line Filter+Kernel Filter with p = 3, which returns a coreset C. We run tensor based single topic modeling (Anandkumar et al., 2014) on A and C, to return 12 top topic distributions from both. We take the best matching between empirical topics and estimated topics based on ℓ1 distance and compute the average ℓ1 difference between them. Here smaller is better. We run this entire method 5 times and report the median of their ℓ1 average differences. (Section 6, Applications)