GEQ: Gaussian Kernel Inspired Equilibrium Models

Authors: Mingjie Li, Yisen Wang, Zhouchen Lin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Empirical Results In our experiments, we employed parallel GEQs with different input scales like MOpt Eqs and averaged the output of each branch after average pooling or nearest up-sampling to fuse the branches. We use weight normalization to ensure the convergence as MOpt Eqs and MDEQ, and set γ to 0.2/M, where M is the minimum x z Wh 2 2 among all patches. For the equilibrium calculation, we used the Anderson algorithm in the forward procedure, similar to other implicit models [28], and applied Phantom gradients [14] for back-propagation. All models were trained using SGD with a step learning rate schedule. We implemented our experiments on the Py Torch platform [39] using RTX-3090. Further details can be found in the Appendix A.6.
Researcher Affiliation Academia Mingjie Li1, Yisen Wang1,2, Zhouchen Lin1,2,3 1 National Key Lab of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2 Institute for Artificial Intelligence, Peking University 3 Peng Cheng Laboratory
Pseudocode Yes Algorithm 1: Calculating one layer GEQ.
Open Source Code No The paper does not contain an explicit statement about open-sourcing the code for GEQ or provide a link to a code repository.
Open Datasets Yes Firstly, we finish the experiments on CIFAR-10 and CIFAR-100. They are widely used datasets for image classification on small images. Besides small datasets, we also conducted experiments on large-scale image datasets, as presented in Table 2. The results clearly demonstrate the consistent superiority of our GEQ over other models, highlighting its clear advantages. Particularly noteworthy is our GEQ achieves a 2% improvement on Image Net-100 against deep model Res Net-50 while consuming approximately half the number of parameters, which emphasizes the effectiveness and efficiency of GEQ on large-scale inputs.
Dataset Splits No The paper mentions using widely used datasets like CIFAR-10, CIFAR-100, ImageNette, and ImageNet-100, which typically have standard splits. However, it does not explicitly provide the specific percentages or sample counts for training, validation, and testing splits for these datasets within the text.
Hardware Specification Yes We implemented our experiments on the Py Torch platform [39] using RTX-3090.
Software Dependencies No The paper mentions using the 'Py Torch platform [39]' but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes In our experiments, we employed parallel GEQs with different input scales like MOpt Eqs and averaged the output of each branch after average pooling or nearest up-sampling to fuse the branches. We use weight normalization to ensure the convergence as MOpt Eqs and MDEQ, and set γ to 0.2/M, where M is the minimum x z Wh 2 2 among all patches. For the equilibrium calculation, we used the Anderson algorithm in the forward procedure, similar to other implicit models [28], and applied Phantom gradients [14] for back-propagation. All models were trained using SGD with a step learning rate schedule. We implemented our experiments on the Py Torch platform [39] using RTX-3090. Further details can be found in the Appendix A.6.