Understanding Representation of Deep Equilibrium Models from Neural Collapse Perspective
Authors: Haixiang Sun, Ye Shi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical analyses through experiments in both balanced and imbalanced scenarios. |
| Researcher Affiliation | Academia | Haixiang Sun Shanghai Tech University sunhx@shanghaitech.edu.cn Ye Shi Shanghai Tech University shiye@shanghaitech.edu.cn |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | We will release the code once the paper is accepted. |
| Open Datasets | Yes | Experimental results on Cifar-10 and Cifar-100 validated our theoretical findings for distinguishing the differences between DEQ and explicit neural networks. |
| Dataset Splits | Yes | We conducted experiments with varying configurations with different numbers of majority and minority classes and imbalance degrees. Assume the numbers of majority and minority classes are (KA, KB) with corresponding sample sizes (n A, n B), the imbalance degree is denoted as R = n A/n B.We considered different setups for majority and minority class quantities, such as (3, 7), (5, 5), and (7, 3). Additionally, we varied the ratio of sample quantities R between majority and minority classes with values of 10, 50 and 100. |
| Hardware Specification | Yes | All experiments were implemented using Py Torch on NVIDIA Tesla A40 48GB. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | we implement the solver with a threshold ϵ set to 10 3 and introduce an early stopping mechanism. If convergence is not achieved within T > 20 iterations, we terminate the fixed-point iteration. During training, we set the learning rate to 1 10 4 and utilize stochastic gradient descent with a momentum of 0.9 and weight decay of 5 10 4. Both EW and EH are set to 0.01. The training phase for each network consists of 100 epochs, with a batch size of 128. |