Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing
Authors: Nived Rajaraman, Fnu Devvrit, Aryan Mokhtari, Kannan Ramchandran
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: We run gradient updates to minimize the loss in eq. (2) until convergence. The model is highly overparameterized with d = k = 500 and r = 4. In the left figure, the vertical axis represents the fraction of columns having their norm at least x times the largest norm across columns, with x [0, 1] on the horizontal axis. In the right figure, for the same experiment we plot a histogram of column norms. Using regularization leads to solutions with most of the columns having small ℓ2 norm, and only a few significant columns. Without regularization, a large number of columns have their ℓ2 norm comparable to the largest column norm. Thus, training with explicit regularization leads to models that are more suitable for pruning. |
| Researcher Affiliation | Academia | Nived Rajaraman EECS Department University of California, Berkeley EMAIL Devvrit CS Department The University of Texas at Austin EMAIL Aryan Mokhtari ECE Department The University of Texas at Austin EMAIL Kannan Ramchandran EECS Department University of California, Berkeley EMAIL |
| Pseudocode | Yes | Algorithm 1 Greedy pruning based on group-Lasso regularization |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | Given n observation matrices {Ai}n i=1, in the matrix sensing framework, the learner is provided measurements yi = Ai, U U T + εi where , indicates the trace inner product and εi is measurement noise assumed to be distributed i.i.d. N(0, σ2)1. [...] Figure 1: We run gradient updates to minimize the loss in eq. (2) until convergence. The model is highly overparameterized with d = k = 500 and r = 4. |
| Dataset Splits | No | The paper describes theoretical analysis and a conceptual algorithm (Algorithm 1) but does not provide details on training, validation, or test dataset splits for empirical evaluation as commonly seen in machine learning papers. It discusses 'n samples' in the finite sample analysis section but does not specify how they are split for empirical validation. |
| Hardware Specification | No | The paper focuses on theoretical analysis and a simulated example (Figure 1) and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | Algorithm 1: Inputs: Set parameters: λ, β, ϵ, γ and mfine-tune. [...] Theorem 3. Consider the population loss with regularization in eq. (9), where U has rank r and its smallest singular value is denoted by σ r. Moreover, consider Uprune as the output of the pruning phase in Algorithm 1 with parameters β, λ, ϵ, γ satisfying the conditions2, β = cβ (σ r)2 r , λ = cλ (σ r)3 kr , γ cγ (σ r)3 kr5/2 , ϵ cϵ (σ r)7/2 kr5/2 , for some absolute constants cβ, cλ, cϵ, cγ > 0. [...] Figure 1: The model is highly overparameterized with d = k = 500 and r = 4. [...] Theorem 1. [...] suppose the entries of the initial model U0 are i.i.d. samples from N(0, α2), where α c1/k3d log(kd) for some constant c1 > 0. |