reproducibilityindex.ai

Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization

Authors: Ameya Velingker, Maximilian Vötsch, David Woodruff, Samson Zhou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the practical use of our algorithmic ideas against existing algorithms. ... We perform two main types of experiments, first comparing the algorithm presented in the next section against existing baselines and then showing the feasibility of using coresets in the BMF setting. For a thorough discussion of our experimental results see Section G.
Researcher Affiliation	Collaboration	1Google Research, Mountain View, USA 2Doctoral School Computer Science Do CS and Faculty of Computer Science, University of Vienna, Austria 3Carnegie Mellon University, Pittsburgh, USA, and Google, Pittsburgh, USA 4UC Berkeley and Rice University, USA.
Pseudocode	Yes	Algorithm 1 Low-rank approximation for matrix e A with t distinct rows. Algorithm 2 Low-rank approximation for matrix A. Algorithm 3 Bicriteria low-rank approximation on F2 for matrix A. Algorithm 4 Low-rank approximation for matrix e A with t distinct rows and t distinct columns. Algorithm 5 Bicriteria low-rank approximation with Lp loss for matrix A.
Open Source Code	No	We implemented our algorithm and the one of (Kumar et al., 2019) in Python. For solving k-means, we used the implementation of Lloyd s algorithm provided by the scikit-learn library (Pedregosa et al., 2011). There is no explicit statement about releasing the authors' code for their method, nor a link provided.
Open Datasets	Yes	We choose two datasets from the UCI Machine Learning Repository (Dua & Graff, 2017), namely the voting record of the 98th Congress... and the Thyroid dataset 1, of 9371 patient data comprising 31 features.
Dataset Splits	No	The paper mentions using specific datasets but does not provide details on how these datasets were split into training, validation, and test sets (e.g., specific percentages or sample counts for each split).
Hardware Specification	Yes	All experiments were performed on a Linux notebook with a 3.9 GHz 12th generation Intel Core i7 six-core processor with 32 Gi B of RAM.
Software Dependencies	No	We implemented our algorithm and the one of (Kumar et al., 2019) in Python. For solving k-means, we used the implementation of Lloyd s algorithm provided by the scikit-learn library (Pedregosa et al., 2011). The paper mentions software names but does not provide specific version numbers for them.
Experiment Setup	No	The paper states: 'We choose the default parameters provided by the implementations and perform matrix operations in the setting specified by the algorithm. We limit the maximum number of iterations so a running time of 20 seconds is not exceeded.' While it mentions some experimental settings, it lacks specific hyperparameters for the training of their model, such as learning rates, batch sizes, or optimizer details.