GEQ: Gaussian Kernel Inspired Equilibrium Models
Authors: Mingjie Li, Yisen Wang, Zhouchen Lin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Empirical Results In our experiments, we employed parallel GEQs with different input scales like MOpt Eqs and averaged the output of each branch after average pooling or nearest up-sampling to fuse the branches. We use weight normalization to ensure the convergence as MOpt Eqs and MDEQ, and set γ to 0.2/M, where M is the minimum x z Wh 2 2 among all patches. For the equilibrium calculation, we used the Anderson algorithm in the forward procedure, similar to other implicit models [28], and applied Phantom gradients [14] for back-propagation. All models were trained using SGD with a step learning rate schedule. We implemented our experiments on the Py Torch platform [39] using RTX-3090. Further details can be found in the Appendix A.6. |
| Researcher Affiliation | Academia | Mingjie Li1, Yisen Wang1,2, Zhouchen Lin1,2,3 1 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2 Institute for Artificial Intelligence, Peking University 3 Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1: Calculating one layer GEQ. |
| Open Source Code | No | The paper does not contain an explicit statement about open-sourcing the code for GEQ or provide a link to a code repository. |
| Open Datasets | Yes | Firstly, we finish the experiments on CIFAR-10 and CIFAR-100. They are widely used datasets for image classification on small images. Besides small datasets, we also conducted experiments on large-scale image datasets, as presented in Table 2. The results clearly demonstrate the consistent superiority of our GEQ over other models, highlighting its clear advantages. Particularly noteworthy is our GEQ achieves a 2% improvement on Image Net-100 against deep model Res Net-50 while consuming approximately half the number of parameters, which emphasizes the effectiveness and efficiency of GEQ on large-scale inputs. |
| Dataset Splits | No | The paper mentions using widely used datasets like CIFAR-10, CIFAR-100, ImageNette, and ImageNet-100, which typically have standard splits. However, it does not explicitly provide the specific percentages or sample counts for training, validation, and testing splits for these datasets within the text. |
| Hardware Specification | Yes | We implemented our experiments on the Py Torch platform [39] using RTX-3090. |
| Software Dependencies | No | The paper mentions using the 'Py Torch platform [39]' but does not specify its version number or any other software dependencies with their versions. |
| Experiment Setup | Yes | In our experiments, we employed parallel GEQs with different input scales like MOpt Eqs and averaged the output of each branch after average pooling or nearest up-sampling to fuse the branches. We use weight normalization to ensure the convergence as MOpt Eqs and MDEQ, and set γ to 0.2/M, where M is the minimum x z Wh 2 2 among all patches. For the equilibrium calculation, we used the Anderson algorithm in the forward procedure, similar to other implicit models [28], and applied Phantom gradients [14] for back-propagation. All models were trained using SGD with a step learning rate schedule. We implemented our experiments on the Py Torch platform [39] using RTX-3090. Further details can be found in the Appendix A.6. |