Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Authors: Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Irina Rish, Pierre-Luc Bacon, Razvan Pascanu, Aristide Baratin
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on CIFAR-10 and Image Net datasets demonstrate that Dem P outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56 . |
| Researcher Affiliation | Collaboration | Simon Dufort-Labbé EMAIL Mila, Université de Montréal Pierluca D Oro Mila, Université de Montréal Evgenii Nikishin Mila, Université de Montréal Irina Rish Mila, Université de Montréal Razvan Pascanu Google Deep Mind Pierre-Luc Bacon Mila, Université de Montréal Aristide Baratin EMAIL Samsung SAIL Montreal Mila, Université de Montréal |
| Pseudocode | Yes | Algorithm 1 Dem P Algorithm |
| Open Source Code | Yes | The code for our experiments is available here. |
| Open Datasets | Yes | Experiments on CIFAR-10 and Image Net datasets demonstrate that Dem P outperforms existing dense-to-sparse structured pruning methods, achieving better accuracy-sparsity tradeoffs and accelerating training by up to 3.56 . |
| Dataset Splits | Yes | We focus our experiments on computer vision tasks, which is standard in pruning literature (Gale et al., 2019). We train Res Net-18 and VGG-16 networks on CIFAR-10, and Res Net-50 networks on Image Net (He et al., 2016; Simonyan & Zisserman, 2015; Krizhevsky et al., 2009; Deng et al., 2009). We follow the training regimes from Evci et al. (2020) for Res Net architectures and use a setting similar to Rachwan et al. (2022) for the VGG to broaden the scope of our experiments. |
| Hardware Specification | Yes | The training utilized a Nvidia RTX8000 GPU. Training utilized a Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions software like JAX (Bradbury et al., 2018) and Torch Vision (maintainers & contributors, 2016) but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | Res Net-18. We train all networks for 250 epochs using a batch size of 128. The learning rate is initially set to 0.005 for Adam, to 0.1 for SGDM, and is thereafter divided by 5 every 77 epochs. While varying regularization is used with our method, it is on top of a constant weight decay (0.0005) used across all methods, including ours. Random crop and random horizontal flips are used for data augmentation. ... Res Net-50. We trained the Res Net-50 for 100 epochs, with a batch size of 256 instead of 4096. The initial learning rate is set to 0.005, before being decayed by a factor of 10 at epochs 30, 70, and 90. Label smoothing (0.1) and data augmentation (random resize to either 256 256 or 480 480, before randomly cropping to 224 224. Followed by random horizontal flip and input normalization) are also used. We again use Adam and SGDM, using constant weight decay (0.0001) for both. |