Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Ellipsoidal Rounding for Nonnegative Matrix Factorization Under Noisy Separability

Authors: Tomohiko Mizutani

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we apply the algorithm to document clustering, and report the experimental results.
Researcher Affiliation	Academia	Tomohiko Mizutani EMAIL Department of Information Systems Creation Kanagawa University Yokohama, 221-8686, Japan
Pseudocode	Yes	Algorithm 1 Ellipsoidal Rounding (ER) for Problem 1 Algorithm 2 Practical Implementation of Algorithm 1 Algorithm 3 Cutting Plane Strategy for Solving Q(S)
Open Source Code	No	The paper states: 'We implemented Algorithm 2, and three variants of XRAY, max , dist and greedy , in MATLAB.' and 'For the implementation of SPA (Gillis and Vavasis, 2014), we used code from the ﬁrst author s website.' This indicates implementation but no explicit statement of releasing the authors' own code or providing a link to it.
Open Datasets	Yes	Two document corpora were used for the clustering-performance evaluation: Reuters21578 and 20 Newsgroups. These corpora are publicly available from the UCI Knowledge Discovery in Databases Archive (http://kdd.ics.uci.edu). ... The data sets are available from the website (http://www.cad.zju.edu.cn/home/dengcai). ... We used the BBC corpus of Greene and Cunningham (2006), which is available from the website (http://mlg.ucd.ie/datasets/bbc.html).
Dataset Splits	No	The paper describes the total size and characteristics of the datasets (e.g., 'Reuters21578 corpus consists of 21,578 documents... The resulting corpus contains 8,258 documents with 18,931 words in 48 classes'), but it does not specify explicit training, validation, or test splits. It mentions random sampling of classes for evaluation but not data partitioning for model training/testing.
Hardware Specification	Yes	All experiments were done in MATLAB on a 3.2 GHz CPU processor and 12 GB memory.
Software Dependencies	No	The paper mentions 'MATLAB' and 'The software package SDPT3 (Toh et al., 1999)' but does not provide specific version numbers for these software components.
Experiment Setup	Yes	The shrinking parameter θ and expanding size parameter η were set as 0.9999 and 5, respectively. ... The parameter δ determined the intensity of the noise, and it was chosen from 0 to 0.5 in 0.01 increments. ... Algorithm 2 was performed in the setting that M is a matrix in the data set and r and ρ are each 10. ... We assigned the values of the document-word matrix on the basis of the tf-idf weighting scheme... and normalized the row vectors to the unit 1-norm.