Symmetry Induces Structure and Constraint of Learning
Authors: Liu Ziyin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the result to four different problems and numerically validate the theory in Section 3. |
| Researcher Affiliation | Collaboration | 1MIT, NTT Research. |
| Pseudocode | No | The paper describes an algorithm (DCS) in prose and mathematical formulation in Section 2.6 but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not explicitly state that source code for the methodology described in this paper is released or provide a link to it. |
| Open Datasets | Yes | We train a Resnet18 on the CIFAR-10 dataset, following the standard training procedures. |
| Dataset Splits | No | The paper mentions training on the CIFAR-10 dataset and using unseen test points but does not explicitly detail the training/validation/test splits or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions using SGD and Adam optimizers but does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train a Resnet18 on the CIFAR-10 dataset, following the standard training procedures. We compute the correlation matrix of neuron firing of the penultimate layer of the model, which follows a fully connected layer. We compare the matrix for both training with and without weight decay and for both preand post-activations (see Appendix B). See Figure 3-right, which shows that homogeneous solutions are preferred when weight decay is used, in agreement with the prediction of Theorem 1. Here, the training proceeds with SGD with 0.9 momentum and batch size 128, consistent with standard practice. We use a cosine learning rate scheduler for 200 epochs. |