Hyperparameter Optimization through Neural Network Partitioning

Authors: Bruno Kacper Mlodozeniec, Matthias Reisser, Christos Louizos

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging.
Researcher Affiliation Collaboration University of Cambridge, Qualcomm AI Research bkm28@cam.ac.uk, {mreisser,clouizos}@qti.qualcomm.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes procedures in paragraph text and illustrates concepts with figures.
Open Source Code Yes Gregory Benton, Marc Finzi, and Andrew G Wilson. Augerino, github, commit=fd542eb90ac6b1c0959156c1f6ad2ba8719d8572. https://github.com/g-benton/ learning-invariances/. (on page 18)
Open Datasets Yes For datasets, we consider MNIST, CIFAR10, Tiny Imagenet along with rot CIFAR10 and rot Tiny Imagenet, variants where the datapoints are randomly rotated at the beginning of training by angles sampled uniformly from [ π, π] (Immer et al., 2022).
Dataset Splits Yes For our federated experiments, we split the 50k MNIST and 45k CIFAR10 training data-points across 100 clients in a non-i.i.d. way to create the typical challenge to federated learning experiments.
Hardware Specification Yes The empirical timings measurements on an NVIDIA RTX 3080-10GB GPU are shown in Table 8. We used a batch-size of 250, 200 for the MNIST and CIFAR10 experiments respectively, and 20 augmentation samples, just like in our main experiments in Table 1 and Figure 3.
Software Dependencies Yes Specifically, we implement modifications to the Py Torch Paszke et al. (2019) provided optimizers that allow us to track per-partition momenta, number of steps, etc.
Experiment Setup Yes Input selection experiments For the model selection (non-differentiable) input selection experiments, we train all variants with Adam with a learning rate of 0.001 and a batch-size of 256 for 10000 iterations. For both Laplace and partitioned networks, we do early stopping based on the marginal likelihood objective (LML for partitioned networks). We use weight-decay 0.0003 in both cases.