The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Authors: Agustinus Kristiadi, Felix Dangel, Philipp Hennig
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show this numerically in Table 1. We train a network in the Cartesian parametrization and obtain its log Z. Then we reparametrize the net with Weight Norm and naïvely compute log Z again. These log Z s are different because the Weight Norm introduces more parameters than the Cartesian one, even though the degrees of freedom are the same. Moreover, the Hessian-determinant is not invariant under autodiff. However, when transformed as argued above, log Z is trivially invariant. Table 2: Test accuracies, averaged over 5 random seeds. Table 3: Hessian-based sharpness measures can change under reparametrization without affecting the model s generalization (results on CIFAR-10). E.3.1 Experiment Setup: We use the toy regression dataset of size 150. Training inputs are sampled uniformly from [0, 8], while training targets are obtained via y = sin x + ϵ, where ϵ N(0, 0.32). |
| Researcher Affiliation | Academia | Agustinus Kristiadi Felix Dangel Vector Institute, University of Tübingen akristiadi,fdangel@vectorinstitute.ai Philipp Hennig University of Tübingen, Tübingen AI Center philipp.hennig@uni-tuebingen.de |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | For MNIST and FMNIST, the network is Le Net. Meanwhile, we use the Wider Res Net-16-4 model for CIFAR-10 and -100. |
| Dataset Splits | No | The paper mentions using 'Test accuracies' and a 'toy regression dataset of size 150. Training inputs are sampled uniformly from [0, 8]'. While it indicates data usage for training and testing, it does not provide specific details on training/validation/test splits (e.g., percentages, sample counts, or explicit cross-validation setups). |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU models, CPU models, memory sizes, or cloud instance types) used for running its experiments. It only mentions general 'Resources used in preparing this research'. |
| Software Dependencies | No | The paper mentions 'Py Torch, Tensor Flow, and JAX' as standard deep learning libraries, but it does not specify the version numbers for these or any other software components used in their experiments, which is required for reproducibility. |
| Experiment Setup | Yes | For ADAM, we use the default setting suggested by Kingma and Ba [39]. For SGD, we use the commonly-used learning rate of 0.1 with Nesterov momentum 0.9 [26]. The cosine annealing method is used to schedule the learning rate for 100 epochs. |