Learning Neural Network Subspaces
Authors: Mitchell Wortsman, Maxwell C Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present experimental results across benchmark datasets for image classification (CIFAR-10 (Krizhevsky et al., 2009), Tiny-Image Net (Le & Yang, 2015), and Image Net (Deng et al., 2009)) for various residual networks (He et al., 2016; Zagoruyko & Komodakis, 2016). |
| Researcher Affiliation | Collaboration | 1University of Washington (work completed during internship at Apple). 2Apple. |
| Pseudocode | Yes | Algorithm 1 Train Subspace |
| Open Source Code | Yes | Code available at https: //github.com/apple/learning-subspaces. |
| Open Datasets | Yes | The CIFAR-10 (Krizhevsky et al., 2009) and Tiny-Image Net (Le & Yang, 2015), and Image Net (Deng et al., 2009) experiments follow Frankle et al. (2020) |
| Dataset Splits | No | The paper describes training parameters and mentions standard datasets but does not explicitly state the use of a validation set or its specific split percentage/methodology for its experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions software like PyTorch and MXNet in its references, but it does not specify the version numbers for any software components used in its experiments. |
| Experiment Setup | Yes | The CIFAR-10 (Krizhevsky et al., 2009) and Tiny-Image Net (Le & Yang, 2015) experiments follow Frankle et al. (2020) in training for 160 epochs using SGD with learning rate 0.1, momentum 0.9, weight decay 1e-4, and batch size 128. For Image Net we follow Xie et al. (2019) in changing batch size to 256 and weight decay to 5e-5. All experiments are conducted with a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016) with 5 epochs of warmup and without further regularization (unless explicitly mentioned). |