reproducibilityindex.ai

Emergence of Sparse Representations from Noise

Authors: Trenton Bricken, Rylan Schaeffer, Bruno Olshausen, Gabriel Kreiman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we find that trained networks follow these theoretical predictions (e.g., Fig. 2). In proportion to the noise variance up to a cutoff, each neuron learns a negative bias so that it is off by default.
Researcher Affiliation	Academia	1Systems, Synthetic and Quantitative Biology, Harvard University 2Redwood Center for Theoretical Neuroscience, University of California, Berkeley 3Computer Science, Stanford University 4Programs in Biophysics and Neuroscience, Harvard Medical School.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	All code and training parameters can be found at: https://github.com/Trent Brick/Sparsity From Noise.
Open Datasets	Yes	We primarily use the CIFAR10 dataset of 50,000 images with 32x32x3 dimensions, training either on the raw pixels (flattening them into a 3,072 dimensional vector) or latent embeddings of 256 dimensions, produced by a Conv Mixer pretrained on Image Net (Trockman & Kolter, 2022; Russakovsky et al., 2015).
Dataset Splits	No	The paper mentions '94.3% validation accuracy' but does not specify the dataset split percentages or counts for training, validation, and test sets. It implies a validation set was used but provides no details for reproduction.
Hardware Specification	No	The paper mentions 'Cluster time for the Transformer and Deep Model experiments was provided by Hofvarpnir Studios.' but does not specify any particular GPU, CPU models, or other hardware details.
Software Dependencies	No	The paper mentions software like 'Py Torch' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We use Kaiming randomly initialized weights (He et al., 2015) and train until the fraction of active neurons converges. ... Our loss function uses the mean squared error between the original image and reconstruction across our full dataset, X. ... We test noise levels σ {0.05, 0.1, 0.3, 0.8, 1.5, 3.0, 10.0}, L1 {1e 04, 1e 05, 1e 06, 1e 07, 1e 08} and Top-k {3, 10, 30, 100, 300, 1000, 3000}. For Top-k we linearly annealed the k value from 10,000 down to its final value within the first 500 epochs. ... We investigate this finding further in Appendix H.2. [Which discusses learning rate and batch size].