Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
$\mu$PC: Scaling Predictive Coding to 100+ Layer Networks
Authors: Francesco Innocenti, El Mehdi Achour, Christopher L Buckley
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we show that 100+ layer PCNs can be trained reliably using a Depth-µP parameterisation [72, 3] which we call µPC . By analysing the scaling behaviour of PCNs, we reveal several pathologies that make standard PCNs difficult to train at large depths. We then show that, despite addressing only some of these instabilities, µPC allows stable training of very deep (up to 128-layer) residual networks on simple classification tasks with competitive performance and little tuning compared to current benchmarks. Moreover, µPC enables zero-shot transfer of both weight and activity learning rates across widths and depths. Our results serve as a first step towards scaling PC to more complex architectures and have implications for other local algorithms. Code for µPC is made available as part of a JAX library for PCNs. (Figure 1 shows test accuracy plots for ReLU ResNets trained on MNIST). |
| Researcher Affiliation | Collaboration | Francesco Innocenti School of Engineering and Informatics University of Sussex, UK EMAIL El Mehdi Achour UM6P College of Computing Rabat, Morocco EMAIL Christopher L. Buckley School of Engineering and Informatics University of Sussex, UK VERSES AI Research Lab Los Angeles, CA, USA EMAIL |
| Pseudocode | No | The paper describes the algorithms and their modifications within the main text and appendices using mathematical formulations, but it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code for µPC is made available as part of a JAX library for PCNs.1 1https://github.com/thebuckleylab/jpc [23]. |
| Open Datasets | Yes | We trained fully connected residual PCNs on standard image classification tasks (MNIST, Fashion-MNIST and CIFAR10). This simple setup was chosen because the main goal was to test whether µPC is capable of training deep PCNs a task that has proved challenging with more complex datasets and architectures [48]. |
| Dataset Splits | Yes | First, we trained Res Nets of varying depth (up to 128 layers) to classify MNIST for a single epoch. Remarkably, we find that µPC allows stable training of networks of all depths across different activation functions (Figs. 1 & A.16). ... As noted in 5, the results on Fashion-MNIST (Fig. A.18) were obtained with depth transfer by tuning 8-layer networks and transferring the optimal learning rates to 128 layers. |
| Hardware Specification | Yes | The experiments involving µPC, hyperparameter transfer, and the monitoring of the condition number of the Hessian during training were all run on an NVIDIA RTX A6000. |
| Software Dependencies | No | Code for µPC is made available as part of a JAX library for PCNs.1 1https://github.com/thebuckleylab/jpc [23]. We always used no biases, batch size B = 64, Adam as parameter optimiser, and GD as inference optimiser (with the exception of Figs. A.8 & A.24). For the SP, all networks used Kaiming Uniform (Wℓ)ij U( 1/Nℓ 1, 1/Nℓ) as the standard (Py Torch) initialisation used to train PCNs. |
| Experiment Setup | Yes | We always used no biases, batch size B = 64, Adam as parameter optimiser, and GD as inference optimiser (with the exception of Figs. A.8 & A.24). For the SP, all networks used Kaiming Uniform (Wℓ)ij U( 1/Nℓ 1, 1/Nℓ) as the standard (Py Torch) initialisation used to train PCNs. ... For the test accuracies in Figs. 1 & A.16, we trained fully connected Res Nets (Eq. 24) to classify MNIST with standard PC, µPC and BP with Depth-µP. ... All networks had width N = 512 and always used as many GD inference iterations as the number of hidden layers H {2i}7 i=3. ... For µPC, we selected runs based on the best results from the depth transfer (see Hyperparameter transfer below). ... For the Res Nets trained on MNIST with µPC (e.g. Fig. 1), we performed a 2D grid search over the following learning rates: η {5e 1, 1e 1, 5e 2, 1e 2} for the weights, and β {1e3, 5e2, 1e2, 5e1, 1e1, 5e0, 1e0, 5e 1, 1e 1, 5e 2, 1e 2} for the activities. |