Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Partially Frozen Random Networks Contain Compact Strong Lottery Tickets
Authors: Hikari Otsuka, Daiki Chijiwa, Ángel López García-Arias, Yasuyuki Okoshi, Kazushi Kawamura, Thiem Van Chu, Daichi Fujiki, Susumu Takeuchi, Masato Motomura
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing 70% of a Res Net on Image Net provides 3.3 compression compared to the SLT found within a dense counterpart, raises accuracy by up to 14.12 points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both. |
| Researcher Affiliation | Collaboration | 1Department of Information and Communications Engineering, Institute of Science Tokyo, Japan 2NTT Corporation, Japan |
| Pseudocode | No | The paper describes methods and proofs using mathematical notation but does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | We evaluate the SLTs within frozen networks on image classification using the CIFAR-10 (Krizhevsky, 2009) and Image Net (Russakovsky et al., 2015) datasets, and on node classification using the OGBN-Arxiv (Hu et al., 2020) dataset. |
| Dataset Splits | Yes | CIFAR-10 and Image Net train data are split into training and validation sets with a 4:1 ratio, while for OGBN-Arxiv we use the default set split. |
| Hardware Specification | Yes | These can be adequately verified with two NVIDIA H100 SXM5 94GB GPUs. |
| Software Dependencies | No | The paper mentions optimizers (stochastic gradient descent, Adam W) and learning rate decay (cosine learning rate decay), but does not specify software library names with version numbers (e.g., PyTorch, TensorFlow, Python version) used for implementation. |
| Experiment Setup | Yes | On CIFAR-10, scores are optimized for 100 epochs using stochastic gradient descent with momentum 0.9, batch size 128, weight decay 0.0001, and initial learning rates of 0.01 and 0.1 for Conv6 and Res Net-18, respectively. On Image Net, scores are optimized by the same setting as Res Net-18 on CIFAR-10, but a 256 batch size. On OBGN-Arxiv, scores are optimized for 400 epochs using Adam W (Loshchilov & Hutter, 2019) with weight decay 0.0001 and initial learning rate 0.01. All experiments use cosine learning rate decay (Loshchilov & Hutter, 2017). |