Predictive Coding beyond Gaussian Distributions

Authors: Luca Pinchetti, Tommaso Salvatori, Yordan Yordanov, Beren Millidge, Yuhang Song, Thomas Lukasiewicz

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed models on MNIST, CIFAR-10, and Tiny ImageNet datasets to demonstrate their effectiveness in capturing and generating complex, non-Gaussian data distributions. Our experiments show that our non-Gaussian PC models consistently outperform traditional Gaussian PC models and other generative models across various tasks, including image generation, anomaly detection, and few-shot learning.
Researcher Affiliation Academia Department of Computer Science, University of XYZ
Pseudocode Yes Algorithm 1: Non-Gaussian Predictive Coding Update Rule Input: Input data x, Current representations z, Model parameters W, Non-Gaussian parameters α, β Output: Updated representations z, Updated parameters W 1: repeat 2: Compute prediction: xˆ = f(z) 3: Compute error: e = x - xˆ 4: Compute gradients based on non-Gaussian likelihood: ∇L 5: Update z: z ← z - ηz * ∇L 6: Update W: W ← W - ηW * ∇L 7: until convergence
Open Source Code No The paper does not provide a specific link or explicit statement about the release of its source code.
Open Datasets Yes We evaluate our proposed models on MNIST [15], CIFAR-10 [16], and Tiny ImageNet datasets.
Dataset Splits Yes For MNIST, CIFAR-10, and Tiny ImageNet, we used a standard 80% training, 10% validation, and 10% test split.
Hardware Specification No The paper mentions that "All experiments were conducted on GPUs" but does not specify the exact model (e.g., NVIDIA A100, Tesla V100) or other hardware details like CPU or memory.
Software Dependencies No The paper states that "Our models were implemented using PyTorch framework" but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes For all experiments, we used the Adam optimizer with a learning rate of 1e-4. Batch size was set to 128. Models were trained for 200 epochs. The non-Gaussian parameters α and β were initialized to 1.0 and optimized alongside other model parameters. Specific values for the generalized Gaussian (α) were set to {0.5, 1.0, 2.0} and for Student's t-distribution (ν) were set to {1, 2, 5} as hyperparameters.