Flexible Phase Dynamics for Bio-Plausible Contrastive Learning
Authors: Ezekiel Williams, Colin Bredenberg, Guillaume Lajoie
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks. |
| Researcher Affiliation | Academia | 1Department of Mathematics and Statistics, Universite de Montreal, Quebec, Canada 2Mila, Quebec AI Institute, Quebec, Canada. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes mathematical formulations and theoretical derivations. |
| Open Source Code | Yes | The code used for the experiments can be found at the following Github repository: https://github.com/zek3r/ICML2023. |
| Open Datasets | Yes | Testing was performed on the binarized MNIST (b MNIST) and the Bars And Stripes (BAS) (Fischer & Igel, 2014) datasets for the RBM, and MNIST for the FF-trained network. |
| Dataset Splits | No | Data was divided into segregated train and test sets for the MNIST experiments, but the test set was only used with the forward-forward algorithm. The BAS dataset is small and the entire, ground-truth data was trained on; therefore there was no test/training split in this case. The paper mentions train and test sets, but no explicit validation set or its split details. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments. It only mentions energy consumption in a general sense. |
| Software Dependencies | No | The paper mentions using "vanilla SGD" and "ADAM (Kingma & Ba, 2014)" for optimizers, but does not specify version numbers for these or any other software libraries or programming languages used. |
| Experiment Setup | Yes | For the experiments in Fig.2, and all distinct models, learning rates were selected by performing a line search over ten values, equally spaced on a log10 scale, to find the learning rate with lowest end-of-training training error. The learning rates in Fig.3 were simply set to 0.05 and 0.0025, for ISD Ao L and ISD respectively. For all experiments on the BAS dataset we trained a RBM with 16 hidden units and 16 visible units. The max and min values for the learning rate line search for Fig.2 were 0.04 and 0.001. Phase length was defined T = 100 for ISD, 100 for ISD Ao L with fixed phase length, and was assigned a mean of 150 for the random phase length version. All networks were trained using vanilla SGD with no regularization. Initialization of all parameters was to white noise with standard deviation of 0.01. All training runs in Fig.2 were 105 steps long, except for CD1 which was 107 gradient steps long (and failed to converge). With the forward-forward algorithm, we trained a network with two hidden layers of 500 units each. ADAM was used for both the standard algorithm and the ISD algorithm. The max and min values for the learning rate line search were 0.05 and 0.002. All training runs in Fig.2, for all algorithms, were 120000 gradient steps long. |