Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization
Authors: Ameya Velingker, Maximilian Vötsch, David Woodruff, Samson Zhou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the practical use of our algorithmic ideas against existing algorithms. ... We perform two main types of experiments, first comparing the algorithm presented in the next section against existing baselines and then showing the feasibility of using coresets in the BMF setting. For a thorough discussion of our experimental results see Section G. |
| Researcher Affiliation | Collaboration | 1Google Research, Mountain View, USA 2Doctoral School Computer Science Do CS and Faculty of Computer Science, University of Vienna, Austria 3Carnegie Mellon University, Pittsburgh, USA, and Google, Pittsburgh, USA 4UC Berkeley and Rice University, USA. |
| Pseudocode | Yes | Algorithm 1 Low-rank approximation for matrix e A with t distinct rows. Algorithm 2 Low-rank approximation for matrix A. Algorithm 3 Bicriteria low-rank approximation on F2 for matrix A. Algorithm 4 Low-rank approximation for matrix e A with t distinct rows and t distinct columns. Algorithm 5 Bicriteria low-rank approximation with Lp loss for matrix A. |
| Open Source Code | No | We implemented our algorithm and the one of (Kumar et al., 2019) in Python. For solving k-means, we used the implementation of Lloyd s algorithm provided by the scikit-learn library (Pedregosa et al., 2011). There is no explicit statement about releasing the authors' code for their method, nor a link provided. |
| Open Datasets | Yes | We choose two datasets from the UCI Machine Learning Repository (Dua & Graff, 2017), namely the voting record of the 98th Congress... and the Thyroid dataset 1, of 9371 patient data comprising 31 features. |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide details on how these datasets were split into training, validation, and test sets (e.g., specific percentages or sample counts for each split). |
| Hardware Specification | Yes | All experiments were performed on a Linux notebook with a 3.9 GHz 12th generation Intel Core i7 six-core processor with 32 Gi B of RAM. |
| Software Dependencies | No | We implemented our algorithm and the one of (Kumar et al., 2019) in Python. For solving k-means, we used the implementation of Lloyd s algorithm provided by the scikit-learn library (Pedregosa et al., 2011). The paper mentions software names but does not provide specific version numbers for them. |
| Experiment Setup | No | The paper states: 'We choose the default parameters provided by the implementations and perform matrix operations in the setting specified by the algorithm. We limit the maximum number of iterations so a running time of 20 seconds is not exceeded.' While it mentions some experimental settings, it lacks specific hyperparameters for the training of their model, such as learning rates, batch sizes, or optimizer details. |