Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics
Authors: Zhoutong Wu, Yimu Zhang, Cong Fang, Zhouchen Lin
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments on FNNs and DEQs based on our theoretical results. We first evaluate the expressivity of both networks on the functions proposed in our two separation results. Then we experiment on specific OOD tasks. |
| Researcher Affiliation | Academia | 1 Academy for Advanced Interdisciplinary Studies, Peking University 2 State Key Lab of General AI, School of Intelligence Science and Technology, Peking University 3 Institute for Artificial Intelligence, Peking University 4 Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly presented in the paper. |
| Open Source Code | No | We will provide the open access to the code after the paper publication. |
| Open Datasets | No | For each experiment, we generate all binary sequences in { 1}d\U for training. |
| Dataset Splits | No | No explicit mention of validation splits or methodology for reproducing the experiments is provided. |
| Hardware Specification | Yes | For all experiments, we execute our programs on Nvidia GTX 1660 and all the programs occupy less than 10M memory and run for less than 2 minutes. |
| Software Dependencies | No | All models in our experiment are trained using ℓ2 loss with Adam W optimizer [44], with a learning rate of 5e-4, weight decay of 1e-4 and a cosine annealing scheduler for 1000 iterations. |
| Experiment Setup | Yes | Following the standard setting, all models in our experiment are trained using ℓ2 loss with Adam W optimizer [44], with a learning rate of 5e-4, weight decay of 1e-4 and a cosine annealing scheduler for 1000 iterations. |