No Free Prune: Information-Theoretic Barriers to Pruning at Initialization
Authors: Tanishq Kumar, Kevin Luo, Mark Sellke
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on neural networks confirm that information gained during training may indeed affect model capacity. |
| Researcher Affiliation | Academia | 1Harvard University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We used the pruning code of (Tanaka et al., 2020)' which refers to third-party code, but does not provide specific access to source code developed for this paper's methodology. |
| Open Datasets | Yes | We consider a two-hidden layer network with Re LU activation, on a train set of points (Gaussian data in Figure 1(a) and Fashion MNIST in Figure 1(b)), as well as a conv Net on noisy CIFAR-10 in 1(c)... |
| Dataset Splits | No | The paper mentions using a 'train set' but does not specify explicit training, validation, or test dataset splits, or reference standard splits that define these proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam' as an optimizer and 'Re LU activation' and refers to 'the pruning code of (Tanaka et al., 2020)', but it does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We train till convergence in loss to within 0.01 (or until accuracy doesn t change for three consecutive epochs), with η = 1e-3 on Adam. We use a batch size of 64 with a two-hidden layer Re LU architecture with a hidden width of 200. |