Understanding and Improving Optimization in Predictive Coding Networks
Authors: Nicholas Alonso, Jeffrey Krichmar, Emre Neftci
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In sum, in our simulations below, IL with sequential inference and the MQ optimizer requires no more memory than BP/SGD, requires only slightly more computation than BP/SGD, converges to similar losses as BP/SGD, and is sensitive to higher-order information, which often leads to faster convergence than BP/SGD. As far as we know, this is the first time an energy-based learning algorithm (as defined in (Scellier and Bengio 2017; Whittington and Bogacz 2019)) has performed as well or better than BP/SGD on natural images in the ways described above without using memory expensive optimizers and without requiring significantly more computation than BP. Experiments In this section, we test the performance of sequential inference and the MQ optimizer. |
| Researcher Affiliation | Academia | Nicholas Alonso1, Jeffrey Krichmar1,2, Emre Neftci3,4 1Department of Cognitive Science, University of California, Irvine 2Department of Computer Science, University of California, Irvine 3Electrical Engineering and Information Technology RWTH Aachen, Germany 4 Peter Gr unberg Institute, Forschungszentrum J ulich, Germany nalonso2@uci.edu, jkrichma@uci.edu, e.neftci@fz-juelich.edu |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "The paper with full appendix can be found at https://arxiv.org/abs/2305.13562" but this link is for the paper itself and does not explicitly state that source code for the methodology is available there or elsewhere. |
| Open Datasets | Yes | Comparing Sequential and Simultaneous IL The standard/simultaneous inference method creates a delayed error propagation through the network, as illustrated in figure 4. We test how this delay effects performance at very small values of T on a classification of CIFAR-10 images. Next, we test the MQ optimizer combined with Seq IL on classifications tasks with natural image data sets: SVHN, CIFAR-10, and Tiny Imagenet. |
| Dataset Splits | No | The paper mentions using "Grid searches" to find learning rates and the use of "Mini-batches size 64", which implies some form of validation, but it does not provide specific details on the training/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or solvers used in the experiments. |
| Experiment Setup | Yes | Mini-batches size 64 are used in all simulations. The MQ optimizer works well in all models using following hyper-parameters: αmin = .001, r = .000001, and ρ = .9999 for fully connected and ρ = .999 for convolutional networks. Grid searches were used to find the learning rate and step size, ϵ, for activity updates. Models are trained over 45 epochs, about the amount of time needed for learning to near convergence. All of IL algorithms use a highly truncated inference phases of T=3. We train fully connected MLPs dimension 3072-3x1024-10 on SVHN and CIFAR-10, and small convolutional networks on SVHN, CIFAR-10, and Tiny Imagenet. We trained fully connected autoencoders (layer sizes 3072-1024-256-20-256-1024-3072) with Re LU at hidden layers and sigmoid at the output layer. IL models use T=6 inference iterations. |