Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

Authors: Kayhan Behdin, Rahul Mazumder

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose new algorithms for our optimization problem; and present numerical experiments on synthetic and real data sets that shed further insights into our proposed framework and theoretical developments. Keywords: Sparse Nonnegative Matrix Factorization, Archetypal Analysis, Robustness to Perturbation, Model misspecification, Nonconvex Optimization
Researcher Affiliation Academia Kayhan Behdin EMAIL Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA; Rahul Mazumder EMAIL MIT Sloan School of Management and Operations Research Center Massachusetts Institute of Technology Cambridge, MA 02139, USA
Pseudocode Yes Algorithm 1 summarizes the above procedure, where Psimplex(W ) projects each row of W onto the unit simplex. See Duchi et al. (2008) for an efficient algorithm to calculate Psimplex. Before presenting the theoretical analysis of Algorithm 1, we define a stationarity point. ... Algorithm 1: Sparse AA(H0, W 0, W 0, λ) ... Algorithm 3: A Local Search improvement for Algorithm 1
Open Source Code Yes 3. Implementation can be found at https://github.com/kayhanbehdin/Sparse AA.
Open Datasets Yes We use the AT&T database of faces (Samaria and Harter, 1994) which consists of 40 different people and 10 different photos of each person, 400 images in total. ... We consider a real data set: the 14 Cancers Gene Expression data set (Ramaswamy et al., 2001). ... For our next set of experiments, we consider the Indian Pines data (Baumgardner et al., 2015), which is a Hyperspectral image segmentation data set. ... We apply SAA on the Scene Categorization data set (Xiao et al., 2010).
Dataset Splits No The paper mentions using a "validation-based scheme" and a "held-out subset" (Section 4.4) for tuning parameters, and specifically for one synthetic experiment, it states "we draw a validation set of size mvalidation = m = 200" (Section 5.1.4). However, for most experiments and real datasets, explicit and reproducible details of training/test/validation splits (e.g., specific percentages, counts, or standard split citations for evaluation) are not provided in the main text.
Hardware Specification Yes Our experiments are done on a computer equipped with Intel(R) Core(R) i7 6700HQ CPU @ 2.60GHz, running Microsoft(R) Windows(R) 10 and using 16GB of RAM.
Software Dependencies No The paper states: "We implemented all of our algorithms in Julia and we use Gurobi(R) to solve MILPs arising in our initialization scheme." While it mentions the software names (Julia, Gurobi), it does not provide specific version numbers for either, which is necessary for reproducible software dependencies.
Experiment Setup Yes We set m = 200, k = 15, n = 5000, Ī» = 1. ... We set ā„“/nk = 0.8 and set the tuning parameters for different algorithms to get solutions that have 0.8nk nonzeros. ... we fix σz = 0.01 and change the value of ā„“... We set ā„“= 0.8nk and σz = 0.1... We set σz = 1, ā„“= 0.5nk and choose Ī» from a logarithmic grid with 100 points between 0.0625 and 12.5. ... We consider k = 25 (following Hoyer (2004)) for this data set... The rank of the factorization is 14... We also set the number of clusters to k = 17 from the ground truth... We also set ā„“= 0.65nk.