Critical feature learning in deep neural networks

Authors: Kirsten Fischer, Javed Lindner, David Dahmen, Zohar Ringel, Michael Krämer, Moritz Helias

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental EXPERIMENTS We compare the obtained analytical results for the output kernel C(L) conditioned on the training data to the numerical implementation of sampling the kernel C(L) emp from the posterior distribution using Langevin stochastic gradient descent (see Appendix G). As a measure we use the centered kernel alignments (CKA, see Appendix I) of both the analytical kernel C(L) and the Langevin sampled kernels C(L) emp with the target kernel Y Y respectively. Since our framework does not presuppose any assumptions on the data, we study two different tasks: XOR and binary classification on MNIST digits; the numerical results match our theoretical expectations consistently in both cases.
Researcher Affiliation Academia 1 Institute for Advanced Simulation (IAS6), Computational and Systems Neuroscience, J ulich Research Centre, J ulich, Germany 2RWTH Aachen University, Aachen, Germany 3Department of Physics, RWTH Aachen University, Aachen, Germany 4Institute for Theoretical Particle Physics and Cosmology, RWTH Aachen University, Aachen, Germany 5The Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel.
Pseudocode Yes Algorithm 1 Width annealing of kernels
Open Source Code Yes All code is available under (10.5281/zenodo.11205498). URL https://doi.org/ 10.5281/zenodo.11205498. 3.4
Open Datasets Yes XOR (Refinetti et al., 2021) (...) MNIST We study a binary classification task on MNIST (Le Cun et al., 1998) between digits 0 and 3.
Dataset Splits No The paper mentions the number of training data points (P) for XOR and MNIST tasks but does not specify any training/validation/test splits, percentages, or absolute counts for dataset partitioning.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or memory configurations used for running the experiments are mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) are mentioned in the paper.
Experiment Setup Yes Parameters: XOR task with σ2 = 0.4, D = 100, L = 3, (a) N = 500, P = 12, (b) P = 12, gl = 1.2, (c) N = 500, gl = 1.2. Results are averaged over 10 training data sets and error bars indicate standard deviation. Parameters: MNIST task with L = 2, N = 2000. Results are averaged over 10 training data sets and error bars indicate plus minus one standard deviation. Other parameters: XOR task with σ2 = 0.4, gl {0.6, 0.825, 1.1, gcrit 1.38, 2.2} , gb = 0.05, L = 20, N = 500 , κ = 10 3, P = 12.