Only Strict Saddles in the Energy Landscape of Predictive Coding Networks?

Authors: Francesco Innocenti, El Mehdi Achour, Ryan Singh, Christopher L Buckley

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both linear and non-linear networks strongly validate our theory and further suggest that all the saddles of the equilibrated energy are strict.
Researcher Affiliation Academia Francesco Innocenti School of Engineering and Informatics University of Sussex F.Innocenti@sussex.ac.uk El Mehdi Achour RWTH Aachen University Aachen, Germany achour@mathc.rwth-aachen.de Ryan Singh School of Engineering and Informatics University of Sussex rs773@sussex.ac.uk Christopher L. Buckley School of Engineering and Informatics University of Sussex VERSES c.l.buckley@sussex.ac.uk
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce all the experiments is available at https://github.com/francesco-innocenti/pc-saddles.
Open Datasets Yes We trained DLNs with different number of hidden layers H {2, 5, 10} on standard image classification datasets (MNIST, Fashion-MNIST and CIFAR10).
Dataset Splits No The paper mentions training networks and observing training loss dynamics but does not explicitly provide information on train/validation/test splits, proportions, or specific methods for data partitioning.
Hardware Specification No The paper's NeurIPS checklist states: "Most experimental results can be reproduced in a few hours on a CPU, with the exception of those related to Figures 5 & 12 which were run on a GPU (typically A100)." This is not a specific hardware specification for all experiments.
Software Dependencies No The paper mentions using "standard Euler integration" and a "second-order explicit Runge Kutta ODE solver (Heun)" but does not list specific software libraries or frameworks with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x).
Experiment Setup Yes The following hyperparameters were used for all networks: 300 hidden units and SGD with learning rate η = 1e 3 and batch size b = 64. We used a second-order explicit Runge Kutta ODE solver (Heun) with a maximum upper integration limit T = 300 and an adaptive Proportional-Integral-Derivative controller (absolute and relative tolerances: 1e 3) to ensure convergence of the PC inference dynamics (Eq. 3). All networks were initialised close to the origin Wij N(0, σ2) with σ = 5e 3.