Hidden Symmetries of ReLU Networks
Authors: Elisenda Grigsby, Kathryn Lindsey, David Rolnick
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted an empirical investigation of hidden symmetries at various parameter settings. In Figure 6, we plot the distribution of approximate functional dimensions for networks with depth d = 4, 5, 6 and with n0 = n1 = = nd 1 equal to 5, 10, 15. |
| Researcher Affiliation | Academia | 1Department of Mathematics, Boston College, Boston, USA 2School of Computer Science, Mc Gill University, Montreal, Canada 3Mila Quebec AI Institute, Montreal, Canada. |
| Pseudocode | No | The paper describes methods and theoretical proofs, but it does not contain any formally labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code or provide a link to a code repository for the methodology described. |
| Open Datasets | No | In our experiments, we consider networks with nd = 1, and to approximate the functional dimension, we evaluate the set of gradient vectors {θFθ(z)}z∈Z over a finite subset of m points Z = {z1, . . . , zm} ⊂ Rn0 in input space (we use points sampled i.i.d. from the zero-centered unit normal). This describes a data generation strategy, not a specific, publicly available dataset. |
| Dataset Splits | No | The paper describes generating input points for functional dimension approximation ('m points Z = {z1, . . . , zm} ⊂ Rn0 in input space') rather than using a pre-defined dataset with explicit training, validation, or testing splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | We initialize networks with weights drawn i.i.d. from normal distributions with variance 2/fan-in, according to standard practice for Re LU networks (He et al., 2015; Hanin & Rolnick, 2018), and biases drawn i.i.d. from a normal distribution with very small variance (arbitrarily set to 0.01). To improve the quality of the approximation in (4), we use m sample points for m equal to 100 times the maximum possible value for dimfun(θ). |