Symmetries in Overparametrized Neural Networks: A Mean Field View
Authors: Javier Maass, Joaquin Fontbona
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the validity of our findings as N gets larger, in a teacher-student experimental setting, training a student NN to learn from a WI, SI or arbitrary teacher model through various SL schemes. |
| Researcher Affiliation | Academia | Javier Maass Martínez Center for Mathematical Modeling University of Chile javier.maass@gmail.com Joaquín Fontbona Center for Mathematical Modeling University of Chile fontbona@dim.uchile.cl |
| Pseudocode | No | The paper describes the SGD training dynamics with an equation (1) and (5), but it does not include a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | Code necessary for replicating the obtained results, as well as a detailed description of our experimental setting, can be sought in the Supp Mat. |
| Open Datasets | No | We consider synthetic data produced in a teacher-student setting... Our data distribution π will be such that (X, Y ) π will satisfy X N(0, σ2 π.Id2) (with σπ = 4), and Y = f (X). |
| Dataset Splits | No | The paper describes the training process in a teacher-student setting, specifying aspects like epochs and minibatch SGD, but it does not define explicit train/validation/test dataset splits with percentages or sample counts, as it uses synthetic data. |
| Hardware Specification | Yes | All the different experiments were run on Python 3.10, on a Google Colab session consisting (by default) of 2 Intel Xeon virtual CPUs (2.20GHz) and with 13GB of RAM. |
| Software Dependencies | No | The paper mentions 'Python 3.10', 'objax default SGD training', 'pytorch and jax', and the 'emlp repository'. While Python 3.10 has a version, specific versions are not provided for the other libraries like objax, pytorch, jax, or emlp. |
| Experiment Setup | Yes | The training parameters were fixed to be (unless explicitly stated otherwise): Step Size: ς α > 0 (with α = 50 in most experiments), εN = 1 N , so that s N k = α N . Regularization parameters: τ = 10 4 and β = 10 6. Batch Size: It was chosen to be B = 20. Number of Training Epochs: ... Ne = N T epochs (iterations). |