Critical feature learning in deep neural networks
Authors: Kirsten Fischer, Javed Lindner, David Dahmen, Zohar Ringel, Michael Krämer, Moritz Helias
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | EXPERIMENTS We compare the obtained analytical results for the output kernel C(L) conditioned on the training data to the numerical implementation of sampling the kernel C(L) emp from the posterior distribution using Langevin stochastic gradient descent (see Appendix G). As a measure we use the centered kernel alignments (CKA, see Appendix I) of both the analytical kernel C(L) and the Langevin sampled kernels C(L) emp with the target kernel Y Y respectively. Since our framework does not presuppose any assumptions on the data, we study two different tasks: XOR and binary classification on MNIST digits; the numerical results match our theoretical expectations consistently in both cases. |
| Researcher Affiliation | Academia | 1 Institute for Advanced Simulation (IAS6), Computational and Systems Neuroscience, J ulich Research Centre, J ulich, Germany 2RWTH Aachen University, Aachen, Germany 3Department of Physics, RWTH Aachen University, Aachen, Germany 4Institute for Theoretical Particle Physics and Cosmology, RWTH Aachen University, Aachen, Germany 5The Racah Institute of Physics, The Hebrew University of Jerusalem, Jerusalem, Israel. |
| Pseudocode | Yes | Algorithm 1 Width annealing of kernels |
| Open Source Code | Yes | All code is available under (10.5281/zenodo.11205498). URL https://doi.org/ 10.5281/zenodo.11205498. 3.4 |
| Open Datasets | Yes | XOR (Refinetti et al., 2021) (...) MNIST We study a binary classification task on MNIST (Le Cun et al., 1998) between digits 0 and 3. |
| Dataset Splits | No | The paper mentions the number of training data points (P) for XOR and MNIST tasks but does not specify any training/validation/test splits, percentages, or absolute counts for dataset partitioning. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or memory configurations used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python version, library versions like PyTorch or TensorFlow) are mentioned in the paper. |
| Experiment Setup | Yes | Parameters: XOR task with σ2 = 0.4, D = 100, L = 3, (a) N = 500, P = 12, (b) P = 12, gl = 1.2, (c) N = 500, gl = 1.2. Results are averaged over 10 training data sets and error bars indicate standard deviation. Parameters: MNIST task with L = 2, N = 2000. Results are averaged over 10 training data sets and error bars indicate plus minus one standard deviation. Other parameters: XOR task with σ2 = 0.4, gl {0.6, 0.825, 1.1, gcrit 1.38, 2.2} , gb = 0.05, L = 20, N = 500 , κ = 10 3, P = 12. |