A theory of weight distribution-constrained learning

Authors: Weishun Zhong, Ben Sorscher, Daniel Lee, Haim Sompolinsky

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To test the theoretical predictions, we use optimal transport theory and information geometry to develop an SGD-based algorithm to find weights that simultaneously learn the input-output task and satisfy the distribution constraint. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions.
Researcher Affiliation Academia Weishun Zhong Harvard and MIT wszhong@mit.edu Ben Sorscher Stanford University bsorsch@stanford.edu Daniel D Lee Cornell Tech ddl46@cornell.edu Haim Sompolinsky Harvard and Hebrew University hsompolinsky@mcb.harvard.edu
Pseudocode Yes Table 1: Disco-SGD algorithm. (a) We perform alternating steps of gradient descent along the cross-entropy loss (Eqn.7), followed by steps along the optimal transport direction (Eqn.9). (b) An illustration of Eqn.8.
Open Source Code No The paper's checklist states 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See Section 3.' However, Section 3 describes the algorithm but does not provide a direct link to code or explicitly state that code is available in supplementary materials.
Open Datasets Yes In particular, we are interested the case with two synaptic populations that models the excitatory/inhibitory synpatic weights of a biological neuron, hence, m = E, I. We model the excitatory/inhibitory synaptic weights as drawn from two separate lognormal distributions... we map out the 2d parameter space of σE and σI using Eqn.10, and find that the optimal choice of parameters which yields the maximum capacity solution is close to the experimentally measured values in a recent connectomic studies in mouse primary auditory cortex [38]. Citation [38] refers to 'Levy and Reyes. Spatial profile of excitatory and inhibitory synaptic connectivity in mouse primary auditory cortex. Journal of Neuroscience, 32(16):5609 5619, 2012.'
Dataset Splits No The paper uses synthetic data where 'The data consists of pairs{ξµ, ζµ}P µ=1'. While it mentions 'training data is fixed at P = 4000' and 'error tolerance for stopping criterion is 1e-4', it does not explicitly define distinct training, validation, and test splits with percentages or sample counts.
Hardware Specification No The paper describes running numerical simulations, but it does not specify any details about the hardware (e.g., GPU models, CPU types, memory) used for these simulations.
Software Dependencies No The paper does not provide specific names or version numbers for any software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes For simulations shown in Fig.4, training data is fixed at P = 4000. In all simulations in Fig.4, the error tolerance for stopping criterion is 1e-4. We use an initial learning rate η1 = η2 = 0.05, which is then reduced by a factor of 1.5 every 200 epochs. The initial norm of the weight vector is set to β0 = 100. We also perform a hyperparameter search for optimal η1 and η2 ranging from 0.01 to 0.1, and chose parameters that gave the most stable training and consistent result across multiple runs.