Neuron birth-death dynamics accelerates gradient descent and converges asymptotically

Authors: Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the meanfield limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of units. We illustrate our algorithms with two models to provide intuition for the mechanism through which convergence is accelerated.
Researcher Affiliation Academia 1Courant Institute, New York University, New York, USA 2Center for Data Science, New York University, New York, USA 3Princeton University, Princeton, New Jersey, USA.
Pseudocode Yes Algorithm 1 Parameter birth-death dynamics consistent with (13)
Open Source Code Yes In Fig. 6, we show convergence to the energy minimizer for a mixture of three Gaussians (details and source code are provided in the SM).
Open Datasets No The paper uses illustrative examples like 'Mixture of Gaussians' and 'Student-Teacher ReLU Network' but does not provide concrete access information (link, DOI, formal citation) for a publicly available dataset. The datasets appear to be constructed or simulated for the experiments.
Dataset Splits No The paper does not explicitly provide specific training, validation, and test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) to reproduce the data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU/CPU models, memory, or specific computing environments.
Software Dependencies No The paper mentions 'implementations in Py Torch' but does not specify version numbers for PyTorch or any other software dependencies, which would be required for reproducible ancillary software details.
Experiment Setup No The paper mentions training with 'stochastic gradient descent (SGD)' and using 'mini-batch estimate'. However, it lacks specific details on hyperparameters such as learning rate, batch size, number of epochs, or other optimizer settings, which are crucial for reproducing the experimental setup.