Hidden Symmetries of ReLU Networks

Authors: Elisenda Grigsby, Kathryn Lindsey, David Rolnick

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted an empirical investigation of hidden symmetries at various parameter settings. In Figure 6, we plot the distribution of approximate functional dimensions for networks with depth d = 4, 5, 6 and with n0 = n1 = = nd 1 equal to 5, 10, 15.
Researcher Affiliation Academia 1Department of Mathematics, Boston College, Boston, USA 2School of Computer Science, Mc Gill University, Montreal, Canada 3Mila Quebec AI Institute, Montreal, Canada.
Pseudocode No The paper describes methods and theoretical proofs, but it does not contain any formally labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described.
Open Datasets No In our experiments, we consider networks with nd = 1, and to approximate the functional dimension, we evaluate the set of gradient vectors {θFθ(z)}z∈Z over a finite subset of m points Z = {z1, . . . , zm} ⊂ Rn0 in input space (we use points sampled i.i.d. from the zero-centered unit normal). This describes a data generation strategy, not a specific, publicly available dataset.
Dataset Splits No The paper describes generating input points for functional dimension approximation ('m points Z = {z1, . . . , zm} ⊂ Rn0 in input space') rather than using a pre-defined dataset with explicit training, validation, or testing splits.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup Yes We initialize networks with weights drawn i.i.d. from normal distributions with variance 2/fan-in, according to standard practice for Re LU networks (He et al., 2015; Hanin & Rolnick, 2018), and biases drawn i.i.d. from a normal distribution with very small variance (arbitrarily set to 0.01). To improve the quality of the approximation in (4), we use m sample points for m equal to 100 times the maximum possible value for dimfun(θ).