The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Authors: Agustinus Kristiadi, Felix Dangel, Philipp Hennig

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show this numerically in Table 1. We train a network in the Cartesian parametrization and obtain its log Z. Then we reparametrize the net with Weight Norm and naïvely compute log Z again. These log Z s are different because the Weight Norm introduces more parameters than the Cartesian one, even though the degrees of freedom are the same. Moreover, the Hessian-determinant is not invariant under autodiff. However, when transformed as argued above, log Z is trivially invariant. Table 2: Test accuracies, averaged over 5 random seeds. Table 3: Hessian-based sharpness measures can change under reparametrization without affecting the model s generalization (results on CIFAR-10). E.3.1 Experiment Setup: We use the toy regression dataset of size 150. Training inputs are sampled uniformly from [0, 8], while training targets are obtained via y = sin x + ϵ, where ϵ N(0, 0.32).
Researcher Affiliation Academia Agustinus Kristiadi Felix Dangel Vector Institute, University of Tübingen akristiadi,fdangel@vectorinstitute.ai Philipp Hennig University of Tübingen, Tübingen AI Center philipp.hennig@uni-tuebingen.de
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statements about releasing source code or links to a code repository for the methodology described.
Open Datasets Yes For MNIST and FMNIST, the network is Le Net. Meanwhile, we use the Wider Res Net-16-4 model for CIFAR-10 and -100.
Dataset Splits No The paper mentions using 'Test accuracies' and a 'toy regression dataset of size 150. Training inputs are sampled uniformly from [0, 8]'. While it indicates data usage for training and testing, it does not provide specific details on training/validation/test splits (e.g., percentages, sample counts, or explicit cross-validation setups).
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU models, CPU models, memory sizes, or cloud instance types) used for running its experiments. It only mentions general 'Resources used in preparing this research'.
Software Dependencies No The paper mentions 'Py Torch, Tensor Flow, and JAX' as standard deep learning libraries, but it does not specify the version numbers for these or any other software components used in their experiments, which is required for reproducibility.
Experiment Setup Yes For ADAM, we use the default setting suggested by Kingma and Ba [39]. For SGD, we use the commonly-used learning rate of 0.1 with Nesterov momentum 0.9 [26]. The cosine annealing method is used to schedule the learning rate for 100 epochs.