Plant 'n' Seek: Can You Find the Winning Ticket?

Authors: Jonas Fischer, Rebekka Burkholz

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To answer these questions systematically, we derive a framework to plant and hide target architectures within large randomly initialized neural networks. For three common challenges in machine learning, we hand-craft extremely sparse network topologies, plant them in large neural networks, and evaluate state-of-the-art lottery ticket pruning methods.
Researcher Affiliation Academia Jonas Fischer Max Planck Institute for Informatics fischer@mpi-inf.mpg.de Rebekka Burkholz CISPA Helmholtz Center for Information Security burkholz@cispa.de
Pseudocode Yes We provide pseudocode and details in App. A.3. (Referring to Algorithm 1 and Algorithm 2 in Appendix A.3)
Open Source Code Yes Our code is publicly available at www.github.com/Relational ML/Plant NSeek.
Open Datasets Yes For that, we use SYNFLOW to discover a weak ticket of sparsity 0.01 from VGG16 with multishot pruning, train the weak ticket on CIFAR10, and plant it back into the network.
Dataset Splits No To assess the accuracy respectively mean squared error of the tickets and trained models, we split off 10% of the data that acts as a hold out test set. (The paper mentions using "validation sets" for convergence but does not specify their size or how they are split from the main data.)
Hardware Specification No The paper does not specify any particular hardware (CPU, GPU, RAM, etc.) used for running the experiments.
Software Dependencies No The paper mentions "Adam Kingma & Ba (2015)" as an optimizer but does not specify versions for software libraries or frameworks like PyTorch, TensorFlow, or Python itself.
Experiment Setup Yes To prune by GRASP, SNIP, SYNFLOW, MAGNITUDE, and RANDOM and train the derived tickets, we use Adam Kingma & Ba (2015) with a learning rate of 0.001. Training of the discovered tickets was done for 10 epochs across all experiments, where we could always observe a convergence of the respective score on the validation sets (accuracy or MSE). We measured loss by MSE respectively cross entropy loss and used a batch size of 32 for all experiments.