Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing
Authors: Nived Rajaraman, Fnu Devvrit, Aryan Mokhtari, Kannan Ramchandran
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1: We run gradient updates to minimize the loss in eq. (2) until convergence. The model is highly overparameterized with d = k = 500 and r = 4. In the left figure, the vertical axis represents the fraction of columns having their norm at least x times the largest norm across columns, with x [0, 1] on the horizontal axis. In the right figure, for the same experiment we plot a histogram of column norms. Using regularization leads to solutions with most of the columns having small ℓ2 norm, and only a few significant columns. Without regularization, a large number of columns have their ℓ2 norm comparable to the largest column norm. Thus, training with explicit regularization leads to models that are more suitable for pruning. |
| Researcher Affiliation | Academia | Nived Rajaraman EECS Department University of California, Berkeley nived.rajaraman@eecs.berkeley.edu Devvrit CS Department The University of Texas at Austin devvrit.03@gmail.com Aryan Mokhtari ECE Department The University of Texas at Austin mokhtari@austin.utexas.edu Kannan Ramchandran EECS Department University of California, Berkeley kannanr@eecs.berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Greedy pruning based on group-Lasso regularization |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | Given n observation matrices {Ai}n i=1, in the matrix sensing framework, the learner is provided measurements yi = Ai, U U T + εi where , indicates the trace inner product and εi is measurement noise assumed to be distributed i.i.d. N(0, σ2)1. [...] Figure 1: We run gradient updates to minimize the loss in eq. (2) until convergence. The model is highly overparameterized with d = k = 500 and r = 4. |
| Dataset Splits | No | The paper describes theoretical analysis and a conceptual algorithm (Algorithm 1) but does not provide details on training, validation, or test dataset splits for empirical evaluation as commonly seen in machine learning papers. It discusses 'n samples' in the finite sample analysis section but does not specify how they are split for empirical validation. |
| Hardware Specification | No | The paper focuses on theoretical analysis and a simulated example (Figure 1) and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | Algorithm 1: Inputs: Set parameters: λ, β, ϵ, γ and mfine-tune. [...] Theorem 3. Consider the population loss with regularization in eq. (9), where U has rank r and its smallest singular value is denoted by σ r. Moreover, consider Uprune as the output of the pruning phase in Algorithm 1 with parameters β, λ, ϵ, γ satisfying the conditions2, β = cβ (σ r)2 r , λ = cλ (σ r)3 kr , γ cγ (σ r)3 kr5/2 , ϵ cϵ (σ r)7/2 kr5/2 , for some absolute constants cβ, cλ, cϵ, cγ > 0. [...] Figure 1: The model is highly overparameterized with d = k = 500 and r = 4. [...] Theorem 1. [...] suppose the entries of the initial model U0 are i.i.d. samples from N(0, α2), where α c1/k3d log(kd) for some constant c1 > 0. |