Neural Parameter Allocation Search
Authors: Bryan A. Plummer, Nikoli Dryden, Julius Frost, Torsten Hoefler, Kate Saenko
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments include a wide variety of tasks and networks in order to demonstrate the broad applicability of NPAS and SSNs. We benchmark SSNs for LBand HB-NPAS and show they create high-performing networks when either using few parameters or adding network capacity. |
| Researcher Affiliation | Collaboration | Boston University, ETH Z urich, MIT-IBM Watson AI Lab |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | To further aid reproducibility, we publicly release our SSN code at https://github.com/Bryan Plummer/SSN. |
| Open Datasets | Yes | We evaluate SSNs on CIFAR-10 and CIFAR100 (Krizhevsky, 2009)... and Image Net (Deng et al., 2009)... We benchmark on Flickr30K (Young et al., 2014)... and COCO (Lin et al., 2014)... We use SQu AD v1.1 (Rajpurkar et al., 2016)... and SQu AD v2.0 (Rajpurkar et al., 2018)... |
| Dataset Splits | Yes | We benchmark on Flickr30K (Young et al., 2014) which contains 30K/1K/1K images for training/testing/validation, and COCO (Lin et al., 2014), which contains 123K/1K/1K images for training/testing/validation. |
| Hardware Specification | Yes | When using 64 V100 GPUs for training WRN-50-2 on Image Net, we see a 1.04 performance improvement in runtime per epoch when using SSNs with 10.5M parameters (15% of the original model). |
| Software Dependencies | No | The paper mentions 'Py Torch’s official implementation' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For each model, we use the authors implementation and hyperparameters, unless noted (more details in Appendix A). Specifically, on CIFAR we train our model using a batch size of 128 for 200 epochs with weight decay set at 5e-4 and an initial learning rate of 0.1 which we decay using a gamma of 0.2 at 60, 120, and 160 epochs. |