Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics

Authors: Zhoutong Wu, Yimu Zhang, Cong Fang, Zhouchen Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments on FNNs and DEQs based on our theoretical results. We first evaluate the expressivity of both networks on the functions proposed in our two separation results. Then we experiment on specific OOD tasks.
Researcher Affiliation Academia 1 Academy for Advanced Interdisciplinary Studies, Peking University 2 State Key Lab of General AI, School of Intelligence Science and Technology, Peking University 3 Institute for Artificial Intelligence, Peking University 4 Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China
Pseudocode No No pseudocode or algorithm blocks are explicitly presented in the paper.
Open Source Code No We will provide the open access to the code after the paper publication.
Open Datasets No For each experiment, we generate all binary sequences in { 1}d\U for training.
Dataset Splits No No explicit mention of validation splits or methodology for reproducing the experiments is provided.
Hardware Specification Yes For all experiments, we execute our programs on Nvidia GTX 1660 and all the programs occupy less than 10M memory and run for less than 2 minutes.
Software Dependencies No All models in our experiment are trained using ℓ2 loss with Adam W optimizer [44], with a learning rate of 5e-4, weight decay of 1e-4 and a cosine annealing scheduler for 1000 iterations.
Experiment Setup Yes Following the standard setting, all models in our experiment are trained using ℓ2 loss with Adam W optimizer [44], with a learning rate of 5e-4, weight decay of 1e-4 and a cosine annealing scheduler for 1000 iterations.