Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Cardinality Sparsity: Applications in Matrix-Matrix Multiplications and Machine Learning
Authors: Ali Mohaddes, Johannes Lederer
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we initially present experimental evidence demonstrating the advantages of sparsity in reducing multiplication and memory costs. Subsequently, we implement cardinality sparsity in machine learning systems. Our results underscore the substantial benefits of cardinality sparsity, demonstrating its advantageous impact. This section validates our results. First, we test the methods for matrix-matrix multiplication introduced in Section 3.1. We evaluate our approach for both simulated and real-world datasets. The empirical results demonstrate large gains in speed especially for matrices of large size. As illustrated in Figure 2, our multiplication technique surpasses both Strassenโs and the conventional algorithms when dealing handling large matrices. |
| Researcher Affiliation | Academia | Ali Mohaddes EMAIL Department of Mathematics University of Hamburg Johannes Lederer EMAIL Department of Mathematics University of Hamburg |
| Pseudocode | Yes | A.1 Algorithms In this section we review matrix-matrix multiplication algorithms. Algorithm 1 Standard Multiplication ... Algorithm 2 Multiplication by Cardinality Sparsity (P > M, N) ... Algorithm 3 Multiplication by Cardinality Sparsity (P < M, N) |
| Open Source Code | No | The paper provides C implementations in Listing 1 and Listing 2 within the appendix. However, there is no explicit statement about releasing the code or a link to an external repository. |
| Open Datasets | Yes | Specifically, we applied our multiplication method to the real datasets (Leter Recognition Dataset, Letter Digits Dataset, and Firm Teacher Clave Direction Classification (Slate, 1991; Alpaydin, 1998; Vurka, 2011)) and compared the results with the standard and Strassen multiplication method. ... We provide examples using real datasets: the Yeast dataset (Nakai, 1991), the Concrete Compressive Strength dataset (Yeh, 1998), and the AI4I 2020 Predictive Maintenance Dataset (ai4, 2020), to demonstrate the efficiency of cardinality sparsity in reducing memory usage in neural networks. ... For the neural-networks application, we use the Optical-Recognition-of-Handwritten-Digits data (Dua & Graff, 2017). |
| Dataset Splits | No | The paper mentions using real-world datasets like 'Letter Recognition Dataset' and 'Optical-Recognition-of-Handwritten-Digits data' and states, 'We train a two-layer relu network with 40 nodes in the hidden layer with gradient descent.' It also mentions generating 'random matrices' for some experiments and 'samples generated by a standard normal distribution' for other tests. However, it does not provide specific train/test/validation split percentages, sample counts, or refer to standard predefined splits for any of the datasets used in its experiments. |
| Hardware Specification | Yes | Note that all simulations are performed by Macbook Pro laptop with 8 cores of CPU and 16 Gigabytes of RAM. |
| Software Dependencies | No | We applied the Tensorly package for this simulation (Kossaifi et al., 2016). The provided C implementation listings use standard libraries like numpy and matplotlib. However, no specific version numbers are mentioned for Tensorly or any other software libraries used. |
| Experiment Setup | Yes | For the neural-networks application, we use the Optical-Recognition-of-Handwritten-Digits data (Dua & Graff, 2017). We train a two-layer relu network with 40 nodes in the hidden layer with gradient descent. For backpropagation, we use Algorithm 2; for forward operations; we use Algorithm 3. After each weight update, we project the weight matrix of the hidden layer onto a cardinality-sparse matrix: We sort and split each column into k partitions, where k is the sparsity degree. We then replace the old values by the mean of the values in each partition... We trained a 4 layers neural network where the number of the first, second, third, and fourth layer are equal to 2, 10, 15, and 20 respectively, where we employed Inverse Square Root Unit (x/ 1 + x2), identity, arctan, and again identity as activation functions. ... We trained the network with 1000 iterations. |