Universality of Winning Tickets: A Renormalization Group Perspective
Authors: William T Redman, Tianlong Chen, Zhangyang Wang, Akshunna S. Dogra
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that Res Net-50 models with transferable winning tickets have flows with common properties, as would be expected from the theory. Similar observations are made for BERT models, with evidence that their flows are near fixed points. Additionally, we leverage our framework to study winning tickets transferred across Res Net architectures, observing that smaller models have flows with more uniform properties than larger models, complicating transfer between them. |
| Researcher Affiliation | Academia | 1Interdepartmental Graduate Program in Dynamical Neuroscience, University of California, Santa Barbara. 2Department of Electrical and Computer Engineering, University of Texas at Austin. 3Department of Mathematics, Imperial College London. 4EPSRC CDT in Mathematics of Random Systems: Analysis, Modelling and Simulation. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | No explicit statement about releasing code or a link to a code repository for the methodology described in this paper was found. |
| Open Datasets | Yes | Example 2D slices of these are plotted in Fig. 2 of Appendix C for Res Net-50 trained on CIFAR-10, and CIFAR-100 from random initialization, with 5% rewind. This data comes from experiments performed by Chen et al. (2021a). and Recent work has found that using DNN parameters from models that have been pre-trained on complex tasks allows for substantial transfer (Chen et al., 2020; 2021a). We therefore examined the effect of pre-training using Image Net (Huh et al., 2016). and We computed the σi on tickets that were found by applying IMP to pre-trained BERT models on ten downstream NLP tasks (Rajpurkar et al., 2016; Wang et al., 2018), which were known to allow for ticket transfer. Data comes from experiments performed by Chen et al. (2020). |
| Dataset Splits | No | The paper mentions using datasets like CIFAR-10, CIFAR-100, and BERT NLP tasks, but does not explicitly provide information on how these datasets were split into training, validation, and test sets (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "scipy.optimize.curve_fit (Jones et al., 2001)" for numerical computations, but it does not specify the version numbers for SciPy or any other key software dependencies (e.g., Python, PyTorch, TensorFlow) used in the experiments. |
| Experiment Setup | Yes | Example 2D slices of these are plotted in Fig. 2 of Appendix C for Res Net-50 trained on CIFAR-10, and CIFAR-100 from random initialization, with 5% rewind. This data comes from experiments performed by Chen et al. (2021a). and In the case of the computer vision experiments (Secs. 5.1 and 5.3), the models were sparsified 20% each round. |