Symmetries in Overparametrized Neural Networks: A Mean Field View

Authors: Javier Maass, Joaquin Fontbona

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the validity of our findings as N gets larger, in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes.
Researcher Affiliation Academia Javier Maass Martínez Center for Mathematical Modeling University of Chile javier.maass@gmail.com Joaquín Fontbona Center for Mathematical Modeling University of Chile fontbona@dim.uchile.cl
Pseudocode No The paper describes the SGD training dynamics with an equation (1) and (5), but it does not include a dedicated pseudocode or algorithm block.
Open Source Code Yes Code necessary for replicating the obtained results, as well as a detailed description of our experimental setting, can be sought in the Supp Mat.
Open Datasets No We consider synthetic data produced in a teacher-student setting... Our data distribution π will be such that (X, Y ) π will satisfy X N(0, σ2 π.Id2) (with σπ = 4), and Y = f (X).
Dataset Splits No The paper describes the training process in a teacher-student setting, specifying aspects like epochs and minibatch SGD, but it does not define explicit train/validation/test dataset splits with percentages or sample counts, as it uses synthetic data.
Hardware Specification Yes All the different experiments were run on Python 3.10, on a Google Colab session consisting (by default) of 2 Intel Xeon virtual CPUs (2.20GHz) and with 13GB of RAM.
Software Dependencies No The paper mentions 'Python 3.10', 'objax default SGD training', 'pytorch and jax', and the 'emlp repository'. While Python 3.10 has a version, specific versions are not provided for the other libraries like objax, pytorch, jax, or emlp.
Experiment Setup Yes The training parameters were fixed to be (unless explicitly stated otherwise): Step Size: ς α > 0 (with α = 50 in most experiments), εN = 1 N , so that s N k = α N . Regularization parameters: τ = 10 4 and β = 10 6. Batch Size: It was chosen to be B = 20. Number of Training Epochs: ... Ne = N T epochs (iterations).