Rare Gems: Finding Lottery Tickets at Initialization
Authors: Kartik Sreenivasan, Jy-yong Sohn, Liu Yang, Matthew Grinde, Alliot Nagle, Hongyi Wang, Eric Xing, Kangwook Lee, Dimitris Papailiopoulos
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the experimental results for the performance of GEM-MINER across various tasks. |
| Researcher Affiliation | Collaboration | c Carnegie Mellon University m Mohamed Bin Zayed University of Artificial Intelligence p Petuum, Inc. w University of Wisconsin-Madison |
| Pseudocode | Yes | Algorithm 1: GEM-MINER |
| Open Source Code | Yes | Our codebase can be found at https://github.com/ksreenivasan/pruning_is_enough. |
| Open Datasets | Yes | We evaluate our algorithm on (Task 1) CIFAR-10 classification... (Task 2) Tiny Image Net classification... (Task 3) Finetuning on the Caltech-101 [7] dataset... and (Task 4) CIFAR-100 classification... |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about a validation dataset split (e.g., percentages or sample counts for a validation set) in the main text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications). |
| Software Dependencies | No | The paper mentions some algorithms and optimizers used (e.g., Adam), but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | If a network reaches its best accuracy after E epochs of dense training, then we run GEM-MINER for E epochs from random init to get a sparse subnetwork at initialization, and then run weight training on the sparse subnetwork for another E epochs. For CIFAR-10, Mobile Net-V2 experiments, where we apply GEM-MINER for 300 epochs and then finetune the sparse model for another 300 epochs, to reach 98.6% sparse model. |