Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clustering units in neural networks: upstream vs downstream information

Authors: Richard D Lange, David Rolnick, Konrad Kording

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an empirical study quantifying modularity of hidden layer representations of a collection of feedforward networks trained on classiﬁcation tasks, across a range of hyperparameters. For each model, we quantify pairwise associations between hidden units in each layer using a variety of both upstream and downstream measures, then cluster them by maximizing their modularity score using established tools from network science. We ﬁnd two surprising results: ﬁrst, dropout dramatically increased modularity, while other forms of weight regularization had more modest eﬀects. Second, although we observe that there is usually good agreement about clusters within both upstream methods and downstream methods, there is little agreement about the cluster assignments across these two families of methods. This has important implications for representation-learning, as it suggests that ﬁnding modular representations that reﬂect structure in inputs (e.g. disentanglement) may be a distinct goal from learning modular representations that reﬂect structure in outputs (e.g. compositionality).
Researcher Affiliation	Academia	Richard D. Lange EMAIL Department of Neurobiology University of Pennsylvania Philadelphia, PA 19104 David S. Rolnick Mila Québec AI Institute Mc Gill University Montréal, Canada H3A 0G4 Konrad P. Kording Department of Neurobiology University of Pennsylvania Philadelphia, PA 19104
Pseudocode	Yes	A.1 Algorithms This section gives pseudocode for the algorithm we used to compute clusters P from the normalized matrix of pairwise associations between units, A. Algorithm 1 Full clustering algorithm. Algorithm 2 Pseudocode for greedy, approximate, spectral method for ﬁnding modules Algorithm 3 Pseudocode for Monte Carlo method for improving clusters.
Open Source Code	Yes	Code is publicly available at https://github.com/KordingLab/clustering-units-upstream-downstream.
Open Datasets	Yes	In our initial experiments, we began by studying a large collection of simple feedforward fullyconnected networks trained on the MNIST dataset (Le Cun et al., 1998) across a range of regularization schemes (Experiment 1).
Dataset Splits	No	The paper mentions using the MNIST dataset and discarding models with less than 80% test accuracy. However, it does not explicitly state the train/validation/test splits used for the experiments. It refers to 'test accuracy' but does not provide details on how the dataset was partitioned for training, validation, and testing.
Hardware Specification	No	All compute jobs were run on a private server and managed using GNU Parallel (Tange, 2011).
Software Dependencies	No	Models were written and trained using Py Torch (Paszke et al., 2019) and Py Torch Lightning4, and all compute jobs were run on a private server and managed using GNU Parallel (Tange, 2011).
Experiment Setup	Yes	We conduct an empirical study quantifying modularity of hidden layer representations of a collection of feedforward networks trained on classiﬁcation tasks, across a range of hyperparameters. For each model, we quantify pairwise associations between hidden units in each layer using a variety of both upstream and downstream measures, then cluster them by maximizing their modularity score using established tools from network science. Table S1: Models and hyperparameters. Number of units in each hidden layer given in parentheses in the ﬁrst column, i.e. MNIST (64, 64) is a MLP with two hidden layers with 64 units in each layer. Each row of the table describes one hyperparameter sweep performed for the corresponding model. L2 regularization was always set to a minimum of 1e 5 to avoid weights growing unboundedly (see Figures S2 through S4 for performance metrics and weight norms of trained models).