Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Ellipsoidal Rounding for Nonnegative Matrix Factorization Under Noisy Separability
Authors: Tomohiko Mizutani
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we apply the algorithm to document clustering, and report the experimental results. |
| Researcher Affiliation | Academia | Tomohiko Mizutani EMAIL Department of Information Systems Creation Kanagawa University Yokohama, 221-8686, Japan |
| Pseudocode | Yes | Algorithm 1 Ellipsoidal Rounding (ER) for Problem 1 Algorithm 2 Practical Implementation of Algorithm 1 Algorithm 3 Cutting Plane Strategy for Solving Q(S) |
| Open Source Code | No | The paper states: 'We implemented Algorithm 2, and three variants of XRAY, max , dist and greedy , in MATLAB.' and 'For the implementation of SPA (Gillis and Vavasis, 2014), we used code from the first author s website.' This indicates implementation but no explicit statement of releasing the authors' own code or providing a link to it. |
| Open Datasets | Yes | Two document corpora were used for the clustering-performance evaluation: Reuters21578 and 20 Newsgroups. These corpora are publicly available from the UCI Knowledge Discovery in Databases Archive (http://kdd.ics.uci.edu). ... The data sets are available from the website (http://www.cad.zju.edu.cn/home/dengcai). ... We used the BBC corpus of Greene and Cunningham (2006), which is available from the website (http://mlg.ucd.ie/datasets/bbc.html). |
| Dataset Splits | No | The paper describes the total size and characteristics of the datasets (e.g., 'Reuters21578 corpus consists of 21,578 documents... The resulting corpus contains 8,258 documents with 18,931 words in 48 classes'), but it does not specify explicit training, validation, or test splits. It mentions random sampling of classes for evaluation but not data partitioning for model training/testing. |
| Hardware Specification | Yes | All experiments were done in MATLAB on a 3.2 GHz CPU processor and 12 GB memory. |
| Software Dependencies | No | The paper mentions 'MATLAB' and 'The software package SDPT3 (Toh et al., 1999)' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The shrinking parameter θ and expanding size parameter η were set as 0.9999 and 5, respectively. ... The parameter δ determined the intensity of the noise, and it was chosen from 0 to 0.5 in 0.01 increments. ... Algorithm 2 was performed in the setting that M is a matrix in the data set and r and ρ are each 10. ... We assigned the values of the document-word matrix on the basis of the tf-idf weighting scheme... and normalized the row vectors to the unit 1-norm. |