Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

Authors: Keitaro Sakamoto, Issei Sato

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first show some empirical results because our new findings about winning tickets in this empirical analysis motivate the PAC-Bayesian analysis for LTH; thus, we interpret the results on the basis of the PAC-Bayesian perspective in the next section. We empirically investigated the properties of winning tickets mainly related to the learning rate.
Researcher Affiliation Academia Keitaro Sakamoto The University of Tokyo sakakei-1999@g.ecc.u-tokyo.ac.jp Issei Sato The University of Tokyo sato@g.ecc.u-tokyo.ac.jp
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We followed the experimental setting of Frankle and Carbin [10] and used a modified version of Open LTH repository [9].
Open Datasets Yes Figure 1 shows the test accuracy on clean and label noise datasets of sparse subnetworks produced by IMP with different learning rates. As for the no label noise setting (green line), there is an accuracy drop at some point as the learning rate is increased. We added the original unpruned baseline (dashed green line) to discuss if the subnetwork is a winning ticket, and this baseline shows no such accuracy drop when increasing the learning rate. ... Test accuracy on CIFAR10 whose labels are randomly flipped... Res Net20 + CIFAR10 Test acc. 89.1% ... Res Net20 + CIFAR100 Test acc. 61.4% ...
Dataset Splits No The paper mentions using CIFAR10 and CIFAR100 but does not explicitly provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models, for running its experiments.
Software Dependencies No The paper mentions tools like 'Py Hessian' and 'Open LTH repository' but does not specify their version numbers or other software dependencies with specific versions.
Experiment Setup Yes We followed the experimental setting of Frankle and Carbin [10]... Figure 1: Test accuracy on CIFAR10 whose labels are randomly flipped... when Res Net20 (90% sparse) and VGG16 (99% sparse) are trained. These subnetworks are produced by IMP with various learning rates. ... Table 1: Trace of Hessian for Res Net20 and VGG16 trained on CIFAR10 and CIFAR100. We used three optimizers, SAM, NVRM, and SGD. The hyperparameter of SAM ρ and NVRM b are chosen from {0.05, 0.1, 0.2, 0.5} and {0.014, 0.018, 0.022, 0.026} respectively with the highest test accuracy. As a baseline, we show the results of SGD with a small learning rate, and the learning rate is also set to small for SAM and NVRM. ... Figure 3: Test accuracy of Res Net20 trained on CIFAR10 when regularization term is added. Top row shows the results of the large learning rate (0.1) and bottom row shows those of the small learning rate (0.01). Left: l2_init, Right: l2_norm. The dashed red line in the top left shows the unpruned baseline with the small learning rate (0.01). ... Regularization hyperparameter: λ