Slot Machines: Discovering Winning Combinations of Random Weights in Neural Networks
Authors: Maxwell M Aladago, Lorenzo Torresani
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our algorithm through experiments on MNIST and CIFAR-10. On MNIST, our randomly weighted Lenet-300-100 (Lecun et al., 1998) obtains a 97.0% test set accuracy when using K = 2 options per connection and 98.2% with K = 8. On CIFAR10 (Krizhevsky, 2009), our six layer convolutional network outperforms the traditionally-trained network when selecting from K = 8 fixed random values at each connection. |
| Researcher Affiliation | Academia | 1 Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA. Correspondence to: Maxwell Mbabilla Aladago <maxwell.m.aladago.gr@dartmouth.edu>. |
| Pseudocode | No | The paper describes the algorithm mathematically and with a diagram (Figure 1), but it does not include a formal pseudocode block or a section explicitly labeled 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | On MNIST, we experiment with the Lenet-300-100 (Lecun et al., 1998) architecture... On CIFAR10 (Krizhevsky, 2009)... |
| Dataset Splits | Yes | We use 15% and 10% of the training sets of MNIST and CIFAR-10, respectively, for validation. We report performance on the separate test set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch (Paszke et al., 2019)' but does not specify any version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | All models use a batch size of 128 and stochastic gradient descent with warm restarts (Loshchilov & Hutter, 2017), a momentum of 0.9 and an ℓ2 penalty of 0.00014. When training GS slot machines, we set the learning rate to 0.2 for K ≤ 8 and 0.1 otherwise. We set the learning rate to 0.01 when directly optimizing the weights (training from scratch and finetuning) except when training VGG-19 where we set set the learning rate to 0.1. |