Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Sparse NMF with Archetypal Regularization: Computational and Robustness Properties
Authors: Kayhan Behdin, Rahul Mazumder
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments. Keywords: Sparse Nonnegative Matrix Factorization, Archetypal Analysis, Robustness to Perturbation, Model misspeciļ¬cation, Nonconvex Optimization |
| Researcher Affiliation | Academia | Kayhan Behdin EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA; Rahul Mazumder EMAIL MIT Sloan School of Management and Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA |
| Pseudocode | Yes | Algorithm 1 summarizes the above procedure, where Psimplex(W ) projects each row of W onto the unit simplex. See Duchi et al. (2008) for an eļ¬cient algorithm to calculate Psimplex. Before presenting the theoretical analysis of Algorithm 1, we deļ¬ne a stationarity point. ... Algorithm 1: Sparse AA(H0, W 0, W 0, Ī») ... Algorithm 3: A Local Search improvement for Algorithm 1 |
| Open Source Code | Yes | 3. Implementation can be found at https://github.com/kayhanbehdin/Sparse AA. |
| Open Datasets | Yes | We use the AT&T database of faces (Samaria and Harter, 1994) which consists of 40 diļ¬erent people and 10 diļ¬erent photos of each person, 400 images in total. ... We consider a real data set: the 14 Cancers Gene Expression data set (Ramaswamy et al., 2001). ... For our next set of experiments, we consider the Indian Pines data (Baumgardner et al., 2015), which is a Hyperspectral image segmentation data set. ... We apply SAA on the Scene Categorization data set (Xiao et al., 2010). |
| Dataset Splits | No | The paper mentions using a "validation-based scheme" and a "held-out subset" (Section 4.4) for tuning parameters, and specifically for one synthetic experiment, it states "we draw a validation set of size mvalidation = m = 200" (Section 5.1.4). However, for most experiments and real datasets, explicit and reproducible details of training/test/validation splits (e.g., specific percentages, counts, or standard split citations for evaluation) are not provided in the main text. |
| Hardware Specification | Yes | Our experiments are done on a computer equipped with Intel(R) Core(R) i7 6700HQ CPU @ 2.60GHz, running Microsoft(R) Windows(R) 10 and using 16GB of RAM. |
| Software Dependencies | No | The paper states: "We implemented all of our algorithms in Julia and we use Gurobi(R) to solve MILPs arising in our initialization scheme." While it mentions the software names (Julia, Gurobi), it does not provide specific version numbers for either, which is necessary for reproducible software dependencies. |
| Experiment Setup | Yes | We set m = 200, k = 15, n = 5000, Ī» = 1. ... We set ā/nk = 0.8 and set the tuning parameters for diļ¬erent algorithms to get solutions that have 0.8nk nonzeros. ... we ļ¬x Ļz = 0.01 and change the value of ā... We set ā= 0.8nk and Ļz = 0.1... We set Ļz = 1, ā= 0.5nk and choose Ī» from a logarithmic grid with 100 points between 0.0625 and 12.5. ... We consider k = 25 (following Hoyer (2004)) for this data set... The rank of the factorization is 14... We also set the number of clusters to k = 17 from the ground truth... We also set ā= 0.65nk. |