Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks

Authors: Maxwell M Aladago, Lorenzo Torresani

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our algorithm through experiments on MNIST and CIFAR-10. On MNIST, our randomly weighted Lenet-300-100 (Lecun et al., 1998) obtains a 97.0% test set accuracy when using K = 2 options per connection and 98.2% with K = 8. On CIFAR10 (Krizhevsky, 2009), our six layer convolutional network outperforms the traditionally-trained network when selecting from K = 8 fixed random values at each connection.
Researcher Affiliation Academia 1 Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA. Correspondence to: Maxwell Mbabilla Aladago <maxwell.m.aladago.gr@dartmouth.edu>.
Pseudocode No The paper describes the algorithm mathematically and with a diagram (Figure 1), but it does not include a formal pseudocode block or a section explicitly labeled 'Algorithm'.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it include a link to a code repository.
Open Datasets Yes On MNIST, we experiment with the Lenet-300-100 (Lecun et al., 1998) architecture... On CIFAR10 (Krizhevsky, 2009)...
Dataset Splits Yes We use 15% and 10% of the training sets of MNIST and CIFAR-10, respectively, for validation. We report performance on the separate test set.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Py Torch (Paszke et al., 2019)' but does not specify any version numbers for PyTorch or other software dependencies.
Experiment Setup Yes All models use a batch size of 128 and stochastic gradient descent with warm restarts (Loshchilov & Hutter, 2017), a momentum of 0.9 and an ℓ2 penalty of 0.00014. When training GS slot machines, we set the learning rate to 0.2 for K ≤ 8 and 0.1 otherwise. We set the learning rate to 0.01 when directly optimizing the weights (training from scratch and finetuning) except when training VGG-19 where we set set the learning rate to 0.1.