Predictive Coding beyond Correlations
Authors: Tommaso Salvatori, Luca Pinchetti, Amine M’Charrak, Beren Millidge, Thomas Lukasiewicz
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show how such findings can be used to improve the performance of predictive coding in image classification tasks, and conclude that such models are able to perform simple end-to-end causal inference tasks. |
| Researcher Affiliation | Collaboration | 1VERSES Research Lab, Los Angeles, CA 90016, USA 2Institute of Logic and Computation, Vienna University of Technology, Austria 3Department of Computer Science, University of Oxford, UK 4MRC Brain Network Dynamics Unit, University of Oxford, UK 5Zyphra, Palo Calto, CA, USA. Correspondence to: Tommaso Salvatori <tommaso.salvatori@verses.ai>. |
| Pseudocode | Yes | We provide the pseudocode of the training process on PC graphs in Algorithm 1. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a direct link to a repository for the methodology described. |
| Open Datasets | Yes | We then show how interventionl queries can be used to improve the test accuracy of PC graphs on MNIST and Fashion MNIST. and The classification experiments are performed on the MNIST, Fashion MNIST, and 2-MNIST datasets. |
| Dataset Splits | No | The paper mentions training data and test data, e.g., 'We use observational training data, X, to fit the PC model.' and 'We evaluate the learned SCM by comparing various difference metrics between true and inferred counterfactual values.' However, it does not provide specific percentages or counts for training/validation/test splits, nor does it explicitly define a validation set. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers like 'vanilla stochastic gradient descent (SGD)' and 'Adam W optimizer' but does not provide version numbers for any software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | The PC graph is trained with 3000 samples for 1000 epochs with a batch size of 128. We use the vanilla stochastic gradient descent (SGD) optimizer for the node values with a learning rate of γ = 3e 3 and T = 8 iterations for inference of node values during training and testing. For the weights, we use the Adam W optimizer with a learning rate of α = 8e 3 and a weight decay of λw = 1e 4. |